rna-tools

RNA Sequence

RNA Sequence with secondary structure prediction methods.

This tool takes a given sequence and returns the secondary structure prediction provided by 5 different tools: RNAfold, RNAsubopt, ipknot, contextfold and centroid_fold. You must have these tools installed. You don’t have to install all tools if you want to use only one of the methods.

It’s easy to add more methods of your choince to this class.

Installation

Depends on what tools you want to use, follow the instructions below.

ContextFold

https://www.cs.bgu.ac.il/~negevcb/contextfold/

needs Java. Try this on Ubuntu 14-04 https://askubuntu.com/questions/521145/how-to-install-oracle-java-on-ubuntu-14-04 Single chain only!

ViennaRNA

https://www.tbi.univie.ac.at/RNA/

For OSX install from the binary Installer from the page.

ipknot OSX

https://github.com/satoken/homebrew-rnatools

If one encounters a problem:

[mm] Desktop$ /usr/local/opt/bin/ipknot
dyld: Library not loaded: /usr/local/opt/glpk/lib/libglpk.40.dylib
  Referenced from: /usr/local/opt/bin/ipknot
  Reason: image not found
[1]    51654 abort      /usr/local/opt/bin/ipknot

the solution is:

brew install glpk # on OSX

RNA Structure

http://rna.urmc.rochester.edu/

Works with 5.8.1; Jun 16, 2016.

Download http://rna.urmc.rochester.edu/RNAstructureDownload.html and untar it in <RNA_PDB_TOOLS>/opt/RNAstructure/; run make, the tools will be compiled in a folder exe. Set up DATPATH in your bashrc to <RNA_PDB_TOOLS>/opt/RNAstructure/data_tables DATAPATH=/home/magnus/work/src/rna-pdb-tools/opt/RNAstructure/data_tables/ (read more http://rna.urmc.rochester.edu/Text/Thermodynamics.html). RNAstructure can be run with SHAPE restraints, read more http://rna.urmc.rochester.edu/Text/File_Formats.html#Constraint about the format. The file format for SHAPE reactivity comprises two columns. The first column is the nucleotide number, and the second is the reactivity. Nucleotides for which there is no SHAPE data can either be left out of the file, or the reactivity can be entered as less than -500. Columns are separated by any white space.

MC-Sym

FAQ

  • Does it work for more than one chain??? Hmm.. I think it’s not. You have to check on your own. –magnus

TIPS

Should you need to run it on a list of sequences, use the following script:

from rna_tools import Seq
f = open("listOfSequences.fasta")
for line in f:
 if line.startswith('>'):
   print line,
 else:
   print line,
   s = Seq.Seq(line.strip()) # module first Seq and class second Seq #without strip this has two lines
   print s.predict_ss(method="contextfold"),
   #print s.predict_ss(method="centroid_fold")

Todo

  • This calss should be renamed to RNASeq and merged with RNASeq class from RNAalignment

exception rna_tools.Seq.MethodNotChosen[source]
class rna_tools.Seq.RNASequence(seq, ss='', name='rna_seq')[source]

RNASequence.

Usage:

>>> seq = RNASequence("CCCCUUUUGGGG")
>>> seq.name = 'RNA03'
>>> print(seq.predict_ss("RNAfold", constraints="((((....))))"))
>RNA03
CCCCUUUUGGGG
((((....)))) ( -6.40)
eval(ss='', no_dangling_end_energies=False, verbose=False)[source]

Evaluate energy of RNA sequence.

Parameters:
  • ss (optional)

  • no_dangling_end_energies (Boolean)

  • verbose (Boolean)

Returns:

Energy (float)

The RNAeval web server calculates the energy of a RNA sequence on a given secondary structure. You can use it to get a detailed thermodynamic description (loop free-energy decomposition) of your RNA structures.

Simply paste or upload your sequence below and click Proceed. To get more information on the meaning of the options click the help symbols. You can test the server using this sample sequence/structure pair.

An equivalent RNAeval command line call would have been:

RNAeval -v -d0 < input.txt

Read more: <http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAeval.cgi>

get_foldability(ss='', verbose=False)[source]

Calculate foldability based on EntRNA.

Steps:

  • parse SS into basepairs,

  • calculate foldabilty

Configuration:

  • Set ENTRNA_PATH to the folder where ENTRNA_predict.py is.

Cmd wrapper in here:

python ENTRNA_predict.py --seq_file pseudoknotted_seq.txt --str_file pseudoknotted_str.txt

Su, C., Weir, J. D., Zhang, F., Yan, H., & Wu, T. (2019). ENTRNA: a framework to predict RNA foldability. BMC Bioinformatics, 20(1), 1–11. http://doi.org/10.1186/s12859-019-2948-5

predict_ss(method='RNAfold', constraints='', enforce_constraint=False, shapefn='', explore='', verbose=0, path='')[source]

Predict secondary structure of the seq.

Parameters:
  • method – {mcfold, RNAfold}

  • onstraints

  • shapefn (str) – path to a file with shape reactivites

  • verbose (boolean)

It creates a seq fasta file and runs various methods for secondary structure prediction. You can provide also a constraints file for RNAfold and RNAsubopt.

Methods that can be used with contraints: RNAsubopt, RNAfold, mcfold.

Methods that can be used with SHAPE contraints: RNAfold.

ContextFold

Example:

$ java -cp bin contextFold.app.Predict in:CCCCUUUGGGGG
CCCCUUUGGGGG
((((....))))

It seems that a seq has to be longer than 9. Otherwise:

$ java -cp bin contextFold.app.Predict in:UUUUUUGGG
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10

# this is OK
$ java -cp bin contextFold.app.Predict in:CCCCUUUGGG
CCCCUUUGGG
.(((...)))

RNAstructure

Example:

>>> seq = RNASequence("GGGGUUUUCCC")
>>> print(seq.predict_ss("rnastructure"))
>  ENERGY = -4.4  rna_seq
GGGGUUUUCCC
((((...))))

and with the shape data:

>>> print(seq.predict_ss("rnastructure", shapefn="data/shape.txt"))
>  ENERGY = -0.2  rna_seq
GGGGUUUUCCC
.(((....)))

the shape data:

1 10
2 1
3 1

You can easily see that the first G is unpaired right now! The reactivity of this G was set to 10. Worked!

MC-Fold

MC-Fold uses the online version of the tool, this is very powerful with constraints:

rna_seq
acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -53.91
(-53.91, '((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))')

curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((........)))).......((((..............((((((((((..............))))))))))..))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -34.77
(-34.77, '((((........)))).......((((..............((((((((((..............))))))))))..))))')

acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((xxxxxxxx))))xxxxxxx((((xxxxxxxxxxxxxx((((((((((xxxxxxxxxxxxxx))))))))))xx))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -34.77
(-34.77, '((((........)))).......((((..............((((((((((..............))))))))))..))))')


acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((********))))*******((((**************((((((((((**************))))))))))**))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -77.30
(-71.12, '(((((((..))))))).......((((((.(((...)))..(((((((((((((((....)))))))))))))))))))))')

acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((**[[[[[**))))*******((((****]]]]]****(((((((((((((((****)))))))))))))))**))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -77.30
('-77.30', '((((**[[[[[**))))*******((((****]]]]]****(((((((((((((((****)))))))))))))))**))))')

explore

The sub-optimal search space can be constrained within a percentage of the minimum free energy structure, as MC-fold makes use of the Waterman-Byers algorithm [18, 19]. Because the exploration has an exponential time complexity, increasing this value can have a dramatic effect on MC-Fold’s run time.

Parisien, M., & Major, F. (2009). RNA Modeling Using the MC-Fold and MC-Sym Pipeline [Manual] (pp. 1–84).

rna_tools.Seq.load_fasta_ss_into_RNAseqs(fn, debug=True)[source]

RNA Secondary Structure

Secondary structure analysis

exception rna_tools.SecondaryStructure.ExceptionOpenPairsProblem[source]
rna_tools.SecondaryStructure.draw_ss(title, seq, ss, img_out, resolution=4, verbose=False)[source]

Draw Secondary Structure using VARNA (you need correct configuration for this).

If everything is OK, return None, if an error (=exception) return stderr.

Usage:

>>> seq = 'GGAAACC'
>>> ss =  '((...))'
>>> img_out = 'output/demo.png'
>>> draw_ss('rna', seq, ss, img_out)
>>> print('Made %s' % img_out)
Made output/demo.png
_images/demo.png

Can be used with http://geekbook.readthedocs.io/en/latest/rna.html

rna_tools.SecondaryStructure.parse_vienna_to_pairs(ss, remove_gaps_in_ss=False)[source]

Parse Vienna (dot-bracket notation) to get pairs.

Parameters:
  • ss (str) – secondary stucture in Vienna (dot-bracket notation) notation

  • remove_gaps_in_ss (bool) – remove - from ss or not, design for DCA (tpp case ss = "(((((((((.((((.(((.....))))))......------)....." works with pk of the first level, [[]]

Returns:

(pairs, pairs_pk)

Return type:

list of two lists

Examples:

>>> parse_vienna_to_pairs('((..))')
([[1, 6], [2, 5]], [])

>>> parse_vienna_to_pairs('(([[))]]')
([[1, 6], [2, 5]], [[3, 8], [4, 7]])

>>> parse_vienna_to_pairs('((--))')
([[1, 6], [2, 5]], [])

>>> parse_vienna_to_pairs('((--))', remove_gaps_in_ss=True)
([[1, 4], [2, 3]], [])

>>> parse_vienna_to_pairs('((((......')
Traceback (most recent call last):
  File "/usr/lib/python2.7/doctest.py", line 1315, in __run
    compileflags, 1) in test.globs
  File "<doctest __main__.parse_vienna_to_pairs[4]>", line 1, in <module>
    parse_vienna_to_pairs('((((......')
  File "./SecondaryStructure.py", line 106, in parse_vienna_to_pairs
    raise ExceptionOpenPairsProblem('Too many open pairs (()) in structure')
ExceptionOpenPairsProblem: Too many open pairs (()) in structure

rna_dot2ct.py

The output file is <input-file>.ct

Wrapper to

RNAstructure: software for RNA secondary structure prediction and analysis. (2010). RNAstructure: software for RNA secondary structure prediction and analysis., 11, 129. http://doi.org/10.1186/1471-2105-11-129

usage: rna_dot2ct.py [-h] [-v] file
file

Input is: >seq aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ((…((((((((((…….)))))))))).))

-h, --help

show this help message and exit

-v, --verbose

be verbose

Secondary structure format conversion

rna_convert_pseudoknot_formats

Run this as:

python rna-pk-simrna-to-one-line.py test_data/simrna.ss

Convert:

> a
....((.(..(((....)))..((((.(.........).)))....).)).......((((......))))
..............................((((...................))))..............

to:

> a
....((.(..(((....)))..((((.(..[[[[...).)))....).))...]]]]((((......))))

and:

>2 chains
(((((......)))))........(.((....(.......)..(((. .)))...)).)
.....((((((......................))))))........ ...........

to:

>2 chains
((((([[[[[[)))))........(.((....(]]]]]].)..(((. .)))...)).)

and:

> b
..(.......(((....)))..((((.(.........).))))).............((((......))))
....((.(......................................).)).....................
..............................((((...................))))..............

to:

> b
..(.[[.[..(((....)))..((((.(..{{{{...).)))))..].]]...}}}}((((......))))

and it works with VARNA:

_images/varna_2pk.png

Convert a secondary structure with a pk to the SimRNA format:

rna_convert_pseudoknot_formats git:(master) ✗ python rna_ss_pk_to_simrna.py test_data/ss_with_pk.ss
((((([[[[[[)))))........(.((....(]]]]]].)..(((. .)))...)).)

(((((......)))))........(.((....(.......)..(((. .)))...)).)
.....((((((......................))))))........ ...........
rna_tools.tools.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.get_multiple_lines(ss)[source]
rna_tools.tools.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.get_parser()[source]
rna_tools.tools.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.is_pk(ss)[source]

RNA Alignment

RNAalignment - a module to work with RNA sequence alignments.

To see a full demo what you can do with this util, please take a look at the jupiter notebook (https://github.com/mmagnus/rna-pdb-tools/blob/master/rna_tools/tools/rna_alignment/rna_alignment.ipynb)

Load an alignment in the Stockholm:

   alignment = ra.RNAalignment('test_data/RF00167.stockholm.sto')

or fasta format::

    import rna_alignment as ra
    alignment = ra.fasta2stokholm(alignment.fasta)
    alignment = ra.RNAalignment

Parameters of the aligmnent:

print(alignment.describe())

Consensus SS:

print(alignment.ss_cons_with_pk)

Get sequnce/s from teh aligment:

>>> seq = a.io[0]

RNASeq

class rna_tools.tools.rna_alignment.rna_alignment.RNASeq(id, seq, ss=None)[source]

RNASeq.

Parameters:
  • id (str) – id of a sequence

  • seq (str) – seq, it be uppercased.

  • ss (str) – secondary structure, default None

seq_no_gaps

seq.replace(‘-’, ‘’)

Type:

str

ss_no_gaps

ss.replace(‘-’, ‘’)

Type:

str

Warning

>>> if 'EF' in s.id: print('Y')
Y
>>> if 'EF' in s: print('Y')
# nothing
draw_ss(title='', verbose=False, resolution=1.5)[source]

Draw secondary structure of RNA with VARNA.

VARNA: Visualization Applet for RNA A Java lightweight component and applet for drawing the RNA secondary structure

_images/varna.png

Cite: VARNA: Interactive drawing and editing of the RNA secondary structure Kevin Darty, Alain Denise and Yann Ponty Bioinformatics, pp. 1974-197,, Vol. 25, no. 15, 2009

http://varna.lri.fr/

get_conserved(consensus, start=0, to_pymol=True, offset=0)[source]

Start UCGGGGUGCCCUUCUGCGUG————————————————–AAGGC-UGAGAAAUACCCGU————————————————-AUCACCUG-AUCUGGAU-AAUGC XXXXXXXXXXXXGXGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX—————————-XXXXX-XCUGAGAXXXXXXXXXXXXXXXXXXXXXX———————————-XXXXXXXX-XXXXXXXX-ACXUG

get_distance_to(nseq)[source]

Get distance of self.seq to nseq.

get_ss_std()[source]
remove_columns(to_remove)[source]

indexing from 0

remove_gaps(check_bps=True, only_canonical=True, allow_gu=True)[source]

Remove gaps from seq and secondary structure of the seq.

Parameters:
  • check_bps (bool) – fix mistakes as

  • only_canonical (bool) – keep in ss only pairs GC, AU

  • allow_gu (bool) – keep in ss also GU pair

_images/ss_misgap.png

A residue “paired” with a gap.

_images/ss_misgap_wrong.png

paired with any residues (in the blue circle). If yes, then this residues is unpair (in this case ) -> .).

_images/ss_misgap_ok.png

if only_canonical (by default) is True then only GC, AU can be paired.

_images/ss_only_canonical.png

If allow_gu is False (be default is True) then GU pair is also possible.

_images/ss_no_gu.png

If you provide seq and secondary structure such as:

GgCcGGggG.GcggG.cc.u.aAUACAAuACCC.GaAA.GGGGAAUAaggCc.gGCc.gu......CU.......uugugcgGUuUUcaAgCccCCgGcCaCCcuuuu
(((((((((....((.((............(((......)))......))))..(((.(.....................)))).......)))))))))........

gaps will be remove as well.

ss_to_bps()[source]

Convert secondary structure into a list of basepairs.

Returns:

a list of base pairs, e.g. [[0, 80], [1, 79], [2, 78], [4, 77], [6, 75], [7, 74], …]

Return type:

bps (list)

RNAalignment

class rna_tools.tools.rna_alignment.rna_alignment.RNAalignment(fn='', fetch='')[source]

RNA alignment - adapter class around BioPython to do RNA alignment stuff

Usage (for more see IPython notebook https://github.com/mmagnus/rna-tools/blob/master/rna_tools/tools/rna_alignment/rna_alignment.ipynb)

>>> a = RNAalignment('test_data/RF00167.stockholm.sto')
>>> print(a.tail())
>>> print(a.ss_cons)
Parameters:
  • fn (str) – Filename

  • io (Bio.AlignIO) – AlignIO.read(fn, “stockholm”)

  • lines (list) – List of all lines of fn

  • seqs (list) – List of all sequences as class:RNASeq objects

  • rf (str) – ReFerence annotation, the consensus RNA sequence

Read more:

and on the format itself

Warning

fetch requires urllib3

align_seq(seq)[source]

Align seq to the alignment.

Using self.rf.

Parameters:

seq (str) – sequence, e.g. -GGAGAGUA-GAUGAUUCGCGUUAAGUGUGUGUGA-AUGGGAUGUC...

Returns:

seq that can be inserted into alignemnt, .-.GG.AGAGUA-GAUGAUUCGCGUUA ! . -> -

Return type:

str

copy_ss_cons_to_all(verbose=False)[source]
copy_ss_cons_to_all_editing_sequence(seq_id, before, after)[source]

Change a sequence’s sec structure.

Parameters:
  • seq_id – string, sequence id to change, eg: AE009948.1/1094322-1094400

  • before – string, character to change from, eg: ,

  • after – string, character to change to, eg: .

Warning

before and after has to be one character long

describe()[source]

Describe the alignment.

> print(a.describe()) SingleLetterAlphabet() alignment with 13 rows and 82 columns

find_core(ids=None)[source]

Find common core for ids.

_images/find_core.png

Fig. By core, we understand columns that have all homologous residues. The core is here marked by x.

Parameters:

id – list, ids of seq in the alignment to use

find_seq(seq, verbose=False)[source]

Find seq (also subsequences) and reverse in the alignment.

Parameters:
  • seq (str) – seq is upper()

  • verbose (bool) – be verbose

seq = "ggaucgcugaacccgaaaggggcgggggacccagaaauggggcgaaucucuuccgaaaggaagaguaggguuacuccuucgacccgagcccgucagcuaaccucgcaagcguccgaaggagaauc"
hit = a.find_seq(seq, verbose=False)
ggaucgcugaacccgaaaggggcgggggacccagaaauggggcgaaucucuuccgaaaggaagaguaggguuacuccuucgacccgagcccgucagcuaaccucgcaagcguccgaaggagaauc
Match: AL939120.1/174742-174619
ID: AL939120.1/174742-174619
Name: AL939120.1
Description: AL939120.1/174742-174619
Number of features: 0
/start=174742
/end=174619
/accession=AL939120.1
Per letter annotation for: secondary_structure
Seq('CCAGGUAAGUCGCC-G-C--ACCG---------------GUCA-----------...GGA', SingleLetterAlphabet())
GGAUCGCUGAACCCGAAAGGGGCGGGGGACCCAGAAAUGGGGCGAAUCUCUUCCGAAAGGAAGAGUAGGGUUACUCCUUCGACCCGAGCCCGUCAGCUAACCUCGCAAGCGUCCGAAGGAGAAUC
find_seq_exact(seq, verbose=False)[source]

Find seq (also subsequences) and reverse in the alignment.

Parameters:
  • seq – string, seq, seq is upper()

  • verbose – boolean, be verbose or not

format_annotation(t)[source]
get_clean_ss(ss)[source]
get_distances()[source]

Get distances (seq identity) all-vs-all.

With BioPython.

blastn: Bad alphabet 'U' in sequence 'AE008922.1/409481-409568' at position '7' only for DNA?

read more (also about matrix at <http://biopython.org/wiki/Phylo> and HTTP://biopython.org/DIST/docs/api/Bio.Phylo.TreeConstruction.DistanceCalculator-class.html

get_gc_rf()[source]

Return (str) #=GC RF or ‘’ if this line is not in the alignment.

get_seq(seq_id)[source]
get_seq_ss(seq_id)[source]
get_seq_with_name(seq_name)[source]
get_shift_seq_in_align()[source]

RF_cons vs ‘#=GC RF’ ???

get_ss_cons()[source]
Returns:

SS_cons_pk line or None if there is now SS_cons_pk.

get_ss_cons_pk()[source]
Returns:

SS_cons_pk line or None if there is now SS_cons_pk:

get_ss_remove_gaps(seq, ss)[source]
Parameters:
  • seq – string, sequence

  • ss – string, ss

UAU-AACAUAUAAUUUUGACAAUAUGG-GUCAUAA-GUUUCUACCGGAAUACC–GUAAAUAUUCU—GACUAUG-UAUA- (((.(.((((,,,(((((((_______.))))))).,,,,,,,,(((((((__.._____))))))…),,)))).)))).

get_the_closest_seq_to_ref_seq(verbose=False)[source]

Example:

>>> a = RNAalignment("test_data/RF02221.stockholm.sto")
>>> a.get_the_closest_seq_to_ref_seq()
AF421314.1/431-344
head()[source]
map_seq_on_align(seq_id, resis, v=True)[source]
Parameters:
  • seqid – seq_id, ‘CP000721.1/2204691-2204775’

  • resis – list resis, [5,6]

maps:

[5, 6, 8]
CAC-U
CAC-U-
CAC-U-UA
[4, None, 6]
map_seq_on_seq(seq_id, seq_id_target, resis, v=True)[source]
Parameters:
  • seq_id – seq_id, ‘AAML04000013.1/228868-228953’

  • seq_id_target – seq_id of target, ‘CP000721.1/2204691-2204778’

  • resis – list resis, [5,6]

map:

[4, 5, 6]
UAU-A
UAU-AA
UAU-AAC
[5, 6, 7]
CAC-U
CAC-U-
CAC-U-U
[4, None, 5]
plot(plot_fn='rchie.png')[source]
reload_alignment()[source]
remove_empty_columns(verbose=False)[source]

Remove empty columns in place.

Example:

>>> a = RNAalignment("test_data/zmp.stk")
>>> print(a)
SingleLetterAlphabet() alignment with 6 rows and 319 columns
---ACCUUGCGCGACUGGCGAAUCC-------------------...AAU CP001644.1/756294-756165
--GCUCUCGCGCGACUGGCGACUUUG------------------...GAA CU234118.1/352539-352459
UGAGUUUUCUGCGACUGACGGAUUAU------------------...CUG BAAV01000055.1/2897-2982
GCCCGUUCGCGUGACUGGCGCUAGU-------------------...CGA CP000927.1/5164264-5164343
-----GGGUCGUGACUGGCGAACA--------------------...--- zmp
UCACCCCUGCGUGACUGGCGAUA---------------------...GUU AP009385.1/718103-718202
>>> a.remove_empty_columns()
>>> print(a)
SingleLetterAlphabet() alignment with 6 rows and 138 columns
---ACCUUGCGCGACUGGCGAAUCC-UGAAGCUGCUUUG-AGCG...AAU CP001644.1/756294-756165
--GCUCUCGCGCGACUGGCGACUUUG------------------...GAA CU234118.1/352539-352459
UGAGUUUUCUGCGACUGACGGAUUAU------------------...CUG BAAV01000055.1/2897-2982
GCCCGUUCGCGUGACUGGCGCUAGU-------------------...CGA CP000927.1/5164264-5164343
-----GGGUCGUGACUGGCGAACA--------G-----------...--- zmp
UCACCCCUGCGUGACUGGCGAUA--------GAACCCUCGGGUU...GUU AP009385.1/718103-718202

go over all seq modifes self.nss_cons

property ss_cons_std
property ss_cons_with_pk

go over ss_cons and overwrite bp is there is pk (ss_cons_pk)

ss_cons: (((.(.((((,,,(((((((_______.))))))).,,,,,,,,(((((((__.._____))))))…),,)))).)))). ss_cons_pk: …………………..[[………………………….]]…………………… ss_cons_with_pk: (((.(.((((,,,(((((((___[[__.))))))).,,,,,,,,(((((((__.._]]__))))))…),,)))).)))).

“return ss_cons_with_pk: string, e.g. (((.(.((((,,,(((((((___[[__.))))

property ss_cons_with_pk_std
subset(ids, verbose=False)[source]

Get subset for ids:

# STOCKHOLM 1.0
#=GF WK Tetrahydrofolate_riboswitch

AAQK01002704.1/947-1059 -U-GC-AAAAUAGGUUUCCAUGC.. #=GC SS_cons .(.((.((—-((((((((((… #=GC RF .g.gc.aGAGUAGggugccgugc.. //

tail()[source]
trimmed_rf_and_ss()[source]

Remove from RF and SS gaps.

Returns:

trf, tss - new RF and SS

Return type:

(str,str)

write(fn, verbose=False)[source]

Write the alignment to a file

rna_alignment_get_species.py

This is an improved version of the script that uses the Rfam MySQL database online interface (thanks @akaped for this idea) (so you need to be connected to the Internet, of course). Redirect the output to the file.

_images/species.png

Warning

This scripts needs mysql-connector-python-rf module to connect the Rfam MySQL server, so install it before using: pip install mysql-connector-python-rf.

Example:

$ rna_alignment_get_species.py RF00004.stockholm.stk
# STOCKHOLM 1.0
Sorex-araneus-(European-shrew)                     AUCGCU-UCU----CGGCC--UUU-U

Examples 2:

$ rna_alignment_get_species.py u5_rfam_u5only.stk --verbose
# STOCKHOLM 1.0
#=GF WK U5_spliceosomal_RNA
#=GF NC 39.90
#=GF RT The spliceosomal snRNAs of Caenorhabditis elegans.
#=GF TC 40.00
#=GF RN [3]
(...)
#=GF AC RF00020
#=GF SE Zwieb C, The uRNA database, PMID:9016512; PMID:18390578
#=GF GA 40.00
#=GF BM cmbuild -F CM SEED
#=GF TP Gene; snRNA; splicing;
Bos-taurus-(cattle)                                GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Sorex-araneus-(European-shrew)                     GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Ictidomys-tridecemlineatus-(thirteen-lined-ground- GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Monodelphis-domestica-(gray-short-tailed-opossum)  GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Oryctolagus-cuniculus-(rabbit)                     GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Cavia-porcellus-(domestic-guinea-pig)              GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Ochotona-princeps-(American-pika)                  GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
usage: rna_alignment_get_species.py [-h] [-v] [--debug] [--id-width ID_WIDTH]
                                    [--evo-mapping EVO_MAPPING]
                                    [--evo-mapping-default] [--one]
                                    [--osfn OSFN] [--rfam]
                                    alignment
alignment

alignment

-h, --help

show this help message and exit

-v, --verbose

be verbose

--debug
--id-width <id_width>

define width of ids, trim species name when longer than this

--evo-mapping <evo_mapping>
--evo-mapping-default
--one
--osfn <osfn>

cache file

--rfam

rna_alignment_calc_energy.py

Calculate energy (.cet) format:

UGGC-CCCUGCGCAA-GGAUGACA
(((..((((.....).))..))))
(((..(((((***)).))..))))

Examples:

 $ rna_alignment_calc_energy.py --template alignments/u6-lower.cet alignments/u6-only-RemovedGapped.stk -v
       --loop-upper guaa --loop-lower guaa
       --loop-upper-cst '(..)' --loop-lower-cst '(..)'
 calc-energy2.py --template u6atac-template.txt u6atac_u6only.sto -v
./calc-energy2.py --template alignments/u6-lower.cet --one alignments/u6-lower-stem-only.sto

Takes cet files (calc-energy-templets):

$ rna_alignment_calc_energy.py --template test_data/u6-lower.cet --one test_data/u6-only.stk  -v # --loop-seq test_data/u6-only-loop-seq-u6-lower
N/A% (0 of 182) |                                                                                                                            | Elapsed Time: 0:00:00 ETA:  --:--:--================================================================================
AB010698.1/46467-46488
(((..((((.....).))..))))
UGGC-CCCUGCGCAA-GGAUGACA
lower ------------------------------------
UGG ugcgca ACA
(((******)))
UGGugcgcaACA
(((((..))))) -10.64
upper ------------------------------------
UGGC-CCCUGCGCAA-GGAUGACA
CCC ugcgca AGG
CCCugcgcaAGG
(((((..))))) -9.6
                       id  low_energy       low_seq        low_ss  up_energy        up_seq         up_ss
0  AB010698.1/46467-46488      -10.64  UGGugcgcaACA  (((((..)))))       -9.6  CCCugcgcaAGG  (((((..)))))
Done: u6-only-loop-seq-u6-lower

by parsing output from MC-Sym:

domains have 5451 elements.
 10:47:16 up 141 days, 26 min,  0 users,  load average: 1.45, 1.30, 1.56
Score: -999.000 GAACAUGGUUCUUGCCUUUUACCAGAACCAUCCGGGUGUUG
Total number of MB structures with 3 stems: 16041
(overlaps: 0, !energy: 335585)
</pre><P><H2>Sorting the structures...
<P></H2><pre></pre><H2><P><P><P>Filtered and Sorted solutions:<P><P><P></H2><pre>
</pre><H2><P><P><P><a HREF="http://biwww2.informatik.uni-freiburg.de/Software/MARNA/index.html" target="_blank">MARNA</a>-formatted:<P><P><P></H2><pre>
GAACAUGGUUCUUGCCUUUUACCAGAACCAUCCGGGUGUUG
((((((((((..))))))))))((((((((...)))))))) -33.20 ( -0.69)
(((((((((....)))))))))((((((((...)))))))) -33.17 ( -0.69)
((((((((((((((((...))))))))))))......)))) -32.40 ( +0.00)

Backtracking with 2 variables (stems):
domains have 5451 elements.
 10:47:16 up 141 days, 26 min,  0 users,  load average: 1.45, 1.30, 1.56
Score: -999.000 GAACAUGGUUCUUGCCUUUUACCAGAACCAUCCGGGUGUUG
Total number of MB structures with 2 stems: 9555
(overlaps: 0, !energy: 165582)
</pre><P><H2>Sorting the structures...
<P></H2><pre></pre><H2><P><P><P>Filtered and Sorted solutions:<P><P><P></H2><pre>
</pre><H2><P><P><P><a HREF="http://biwww2.informatik.uni-freiburg.de/Software/MARNA/index.html" target="_blank">MARNA</a>-formatted:<P><P><P></H2><pre>
GAACAUGGUUCUUGCCUUUUACCAGAACCAUCCGGGUGUUG
((((((((((..))))))))))((((((((...)))))))) -33.20 ( -0.69)
(((((((((....)))))))))((((((((...)))))))) -33.17 ( -0.69)
((((((((((((((((...))))))))))))......)))) -32.40 ( +0.00)
usage: rna_alignment_calc_energy.py [-h] [--debug] [--one] [--method METHOD]
                                    [--csv CSV] [--loop-seq]
                                    [--template TEMPLATE] [--flanks FLANKS]
                                    [-v]
                                    alignment
alignment

an alignment in the Stockholm format

-h, --help

show this help message and exit

--debug
--one

one only for the first seq

--method <method>

mcfold or rnastructure_CycleFold

--csv <csv>
--loop-seq
--template <template>
--flanks <flanks>

GC be default

-v, --verbose

rna_align_get_ss_from_fasta.py

Input as a file:

>ade
GCU-U-CAUAUAAUCCUAAUGAUAUGG-UUUGGGA-GUUUCUACCAAGAG-CC--UUAAA-CUCUU---GAUUAUG-AAGU-
(((.(.((((,,,(((((((_______.))))))).,,,,,,,,(((((((__.._____))))))...),,)))).)))).

to get:

>ade
GCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGU
((((((((...(((((((.......)))))))........((((((.......))))))..))))))))
usage: rna_align_get_ss_from_fasta.py [-h] file
file

subsection of an alignment

-h, --help

show this help message and exit

rna_align_get_ss_from_stk.py

Process an alignment in the Stockholm format to get sequences and secondary structures:

Example:

$ rna_align_get_ss_from_stk.py aligns/gmp_RF01786.stockholm_p75_pk.sto
AAOX01000007.1/31274-31356
AAGAAUAUAGAACACUGUGAUGAGCGGUUUUUAUUUGCACUUUAAACCGCUUGGAGUGACUAGUGCAGCCGGCCAAUGAUCUA
.(((.(((.(..(((......((((((((...............))))))))...)))...............).))).))).
CP000724.1/3560727-3560809
AAAAAUGUAGAGCAAAUGAACUGCAGGUAUACAUGGACGCCUUAAACUGCAGGGAUGUAGUGGCGUAACCGACUAACAAUAUU
((.(.(((((.(((......((((((.(......[[...[....).))))))...)))...]...]..]...))).)).).))
AACY023761929.1/1009-1091
AUAAUUUGGUGGGCGUUGAUGUGCCCUUUGUAUCUGGUCGCUUGAGGGGUACGGAGCCAAUAGCGAAACCGCCGCCGUCAUAG
.((...((((((((......((((((((.......[.[......))))))))...)))......]...]...)))))...)).
usage: rna_align_get_ss_from_stk.py [-h] file
file

subsection of an alignment

-h, --help

show this help message and exit

rna_align_distance_to_seq.py

Calculate

“Process an alignment in the Stockholm format to get sequences and secondary structures:

Example:

$ rna_align_distance_to_seq.py test_data/gmp_ref.sto test_data/gmp_ref_distance.csv

   distance                            id
0      1.00                           gmp
1      0.69    AE000513.1/1919839-1919923
2      0.73      BA000004.3/387918-388001
3      0.69  ABFD02000011.1/154500-154585
4      0.73      AE015927.1/474745-474827
5      0.75                AAWL01000006.1
6      0.72                    AM180355.1
7      0.72      CP001116.1/102374-102457
8      0.65    AJ965256.1/1260708-1260792

                                                 seq
0  -----GCGCGGAAAC-AAUGAUGAAU--GGG-UUUA-AAUUGGGC-...
1  CUGUCGAAGAGACGC-GAUGAAUCCC--GCC-CUGUAAUUCGGGC-...
2  AAUCAAUAGGGAAGC-AACGAAGCAU--AGC-CUUU-AUAUGGAC-...
3  AAAUAUUAUAGAGAU-GUUGAAGUAU--AUU-CUAUUA-UUGGGC-...
4  AUUUUAAGAGGAAAU-UUUGAACUAU--AUA-CUU--AUUUGGGC-...
5  --UGCAA-UGGGUGU-GAUGAAGUCC--GGA-CAGUAAUGUGGGC-...
6  AAUAUUU-UAGAAAC-UGAGAAGUAU--AUC-UUAUUA-UUGGGC-...
7  AUAACGGCACGAAGC-AAUGAAAUGU--UCG-AUGU-AACCGGGC-...
8  AAAUUAAGGGGAAGC-GUUGAGCCGC--UAC-CCAU-AUGUGGUUC...

                                                 ss:
0  (((((((((((.(((.......((((..(((.(................
1  (((((((((((.(((.......((((..(((.(................
2  (((((((((((.(((.......((((..(((.(................
3  (((((((((((.(((.......((((..(((.(................
4  (((((((((((.(((.......((((..(((.(................
5  (((((((((((.(((.......((((..(((.(................
6  (((((((((((.(((.......((((..(((.(................
7  (((((((((((.(((.......((((..(((.(................
8  (((((((((((.(((.......((((..(((.(................
usage: rna_align_distance_to_seq.py [-h] file output
file

an alignment in the Stokholm format, the first seq will be used to calculate distance to (#TODO pick any seq)

output

csv pandas file

-h, --help

show this help message and exit

rna_align_foldability.py

Calculate statistics of foldability on an alignment.

The tool uses ENTRANA [1] to calculate, what the authors called, foldability (column: “foldability”) of a given sequence into a given secondary structure.

Next, MC-Fold [2] is executed to calculate free energy (column: “mcsym”) on the sequence and the secondary structure obtained based on the alignment. The secondary structure is used as constraints.

The third used program is RNAfold from the Vienna package [3]. Also, in this case the secondary structure obtained with rna-tools from the RNA alignment is used as constraints, columns: “mfe” (minimum free energy), “mfess” (secondary structure for minimum free energy state), “cfe” (minimum free energy of centroid), “cfess” (secondary structure for centroid, “diversity” (ensemble diversity), “efe” (free energy of the thermodynamic ensemble), “efess” (secondary structure for the thermodynamic ensemble), “freq” (frequency of mfe structure in ensemble). RNAfold is also executed in with “–enforceConstraint” where the constraints are enforced. This run gives analogous values as the default RNAfold, to all RNAfold column “_enforce” is added.

The tool is able to calculate the distance Levenshtein (the difference between the two sequences)(column: “distance”) from the target sequence and all sequence in the alignment to test if there is a bias in the accuracy towards the most similar sequences.

Another tool used from the Vienna package is RNAeval. The tool calculates free energy for a given sequence and secondary structure.

The accuracy is expressed as the median of core RMSD of 10% the lowest core RMSD models for the given sequences.

_images/rp17_foldability.png

The correlations:

accuracy             1.000000
cfe                  0.653813
foldability          0.622038
mfe                  0.607340
efe                  0.585077
diversity            0.404350
eval                 0.349499
cfe_enforce          0.311744
mfe_enforce          0.302973
efe_enforce          0.280929
distance             0.256870
freq                 0.037037
diversity_enforce    0.018429
mcsym                0.017533
freq_enforce        -0.037991
length              -0.340809

The data:

We tested correlations between the above-mentioned statistics, and the highest correlation, 0.65 () was achieved to the centroid free energy calculated with RNAFold, which suggests that to some extent this metric could be used to pick sequence from the alignment to pick sequences that are more likely to fold.

However, this needs further investigation and the detailed analysis an all cases and more folded sequences.

  1. Su C, Weir JD, Zhang F, Yan H, Wu T. ENTRNA: a framework to predict RNA foldability. BMC Bioinformatics. BioMed Central 2019

  2. Parisien M, Major F. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 2008;452:51-5

  3. Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA Package 2.0. Algorithms Mol Biol. BioMed Central; 2011;6:26-14.

Example:

$ python rna_align_foldability.py test_data/gmp_ref.sto test_data/gmp_foldability.csv
                                       cfess_enforce  distance  diversity
0  ((((((.(((......(((((((((..............)))))))...      1.00       3.96
1  (((.(((((((..((......(((((.(((((.......))).......      0.69       5.56
2                                                         0.73       3.84
3                                                         0.69       5.92
4  (((....((((.((((((...((((((((...............))...      0.73       7.49
5  ((((.(((.(((..((..((((((((.................)))...      0.75       7.92
6  .(((((((((.((((.....(((((((((...............))...      0.72       5.83
7                                                         0.72       7.35
8  ...((((((((.(((......((((((.((....(((...)))..)...      0.65       4.86

   diversity_enforce    efe  efe_enforce
0               2.89 -14.77       -13.75
1               3.70 -19.52       -18.25
2               0.00 -15.41         0.00
3               0.00 -13.55         0.00
4               2.46  -8.58        -6.91
5               6.37 -20.72       -20.08
6               2.92 -11.87       -11.38
7               0.00 -14.59         0.00
8               3.83 -21.16       -20.64

                                               efess
0  ((((((..........(((((((((..............)))))))...
1  {{..(((((((..((......(((((.(({((.......}}).......
2  .......((((.(((..(...(((((((((..............))...
3  .....((((((.(((.....(((((((((.......{{...,}..,...
4  .......{(((.,{{{{,...((((((((...............))...
5  {(((.(((.(((..((..((((((((,{{...,.....)}}..)))...
6  .(((((((((.((((.....(((((((((...............))...
7  .....{,.{{{.(((......((((((((((..........,.}))...
8  ...{(((((((.(((......((((((.((....(((...)))..)...

                         ...                         length   mcsym
0                        ...                           75.0  -39.73
1                        ...                           85.0  -37.89
2                        ...                           84.0  -35.40
3                        ...                           86.0  -36.11
4                        ...                           83.0  -37.37
5                        ...                           80.0  -43.59
6                        ...                           84.0  -42.95
7                        ...                           84.0  -36.55
8                        ...                           85.0  -43.58

                      mcsym comment
0  energy best dynamics programming
1                         BP energy
2                         BP energy
3                         BP energy
4  energy best dynamics programming
5  energy best dynamics programming
6  energy best dynamics programming
7                         BP energy
8  energy best dynamics programming

                                            mcsym ss   mfe mfe_enforce
0  ((((((.(((......((((((((................))))))... -13.9       -12.9
1  (((.(((((((..((......(((((.((.................... -18.0       -17.3
2  ....(..((((.(((......((((((((................)... -14.0         0.0
3  .(...((((((.(((......((((((((.................... -12.0         0.0
4  (((....((((.(((......((((((((...............))...  -7.2        -6.1
5  ((((.(((.(((......((((((((.................)))... -18.6       -18.6
6  .(((((((((.(((......((((((((.................)... -10.5       -10.5
7  ...(.((.(((.(((......((((((((................)... -12.8         0.0
8  ...((((((((.(((......((((((.(.................... -19.8       -19.8

                                               mfess
0  ((((((..........(((((((((..............)))))))...
1  ((..(((((((..((......(((((.(((((.......))).......
2  .......((((.(((......(((((((((..............))...
3  .....((((((.(((.....(((((((((....................
4  .....................((((((((...............))...
5  ((((.(((.(((..((..(((((((((((.........)))..)))...
6  .(((((((((.((((.....(((((((((...............))...
7  .....((.(((.(((......((((((((((............)))...
8  ...((((((((.(((......((((((.((....(((...)))..)...

                                       mfess_enforce
0  ((((((.(((......(((((((((..............)))))))...
1  (((.(((((((..((......(((((.(((((.......))).......
2                                              error
3                                              error
4  (((....((((.((((((...((((((((...............))...
5  ((((.(((.(((..((..(((((((((((.........)))..)))...
6  .(((((((((.((((.....(((((((((...............))...
7                                              error
8  ...((((((((.(((......((((((.((....(((...)))..)...

                                                 seq
0  GCGCGGAAACAAUGAUGAAUGGGUUUAAAUUGGGCACUUGACUCAU...
1  CUGUCGAAGAGACGCGAUGAAUCCCGCCCUGUAAUUCGGGCACCUC...
2  AAUCAAUAGGGAAGCAACGAAGCAUAGCCUUUAUAUGGACACUUGG...
3  AAAUAUUAUAGAGAUGUUGAAGUAUAUUCUAUUAUUGGGCACCUUA...
4  AUUUUAAGAGGAAAUUUUGAACUAUAUACUUAUUUGGGCACUUUGU...
5  UGCAAUGGGUGUGAUGAAGUCCGGACAGUAAUGUGGGCACUUAGUC...
6  AAUAUUUUAGAAACUGAGAAGUAUAUCUUAUUAUUGGGCAUCUGGA...
7  AUAACGGCACGAAGCAAUGAAAUGUUCGAUGUAACCGGGCACCUAU...
8  AAAUUAAGGGGAAGCGUUGAGCCGCUACCCAUAUGUGGUUCACUCG...

                                                  ss
0  ((((((.(((......((((((((................))))))...
1  (((.(((((((..((......(((((.((....................
2  ....(..((((.(((......((((((((................)...
3  .(...((((((.(((......((((((((....................
4  (((....((((.(((......((((((((...............))...
5  ((((.(((.(((......((((((((.................)))...
6  .(((((((((.(((......((((((((.................)...
7  ...(.((.(((.(((......((((((((................)...
8  ...((((((((.(((......((((((.(....................

[9 rows x 26 columns]
usage: rna_align_foldability.py [-h] [--all-stars] [--dev] [--skip-mcfold]
                                [-v]
                                file output
file

an alignment in the Stokholm format, the first seq will be used to calculate distance to (#TODO pick any seq)

output

csv pandas file

-h, --help

show this help message and exit

--all-stars

this takes usully super long

--dev
--skip-mcfold
-v, --verbose

be verbose

Random assignment of nucleotides

random_assignment_of_nucleotides.py

random_assignment_of_nucleotides.py - Random assignment of nucleotides for non-typical characters in the sequence alignment (arg –alignfn or fasta file with sequneces (arg –seqfn)

R = G A (purine)
Y = U C (pyrimidine)
K = G U (keto)
M = A C (amino)
S = G C (strong bonds)
W = A U (weak bonds)
B = G U C (all but A)
D = G A U (all but C)
H = A C U (all but G)
V = G C A (all but T)
N = A G C U (any)

author: A. Zyla - azyla

Warning

Tested only on fasta files! and requires Biopython (tested with v1.68)

usage: random_assignment_of_nucleotides.py [-h] [-v] [--alignfn ALIGNFN]
                                           [--seqfn SEQFN] [--outfn OUTFN]
-h, --help

show this help message and exit

-v, --verbose

increase output verbosity

--alignfn <alignfn>

alignment in the Fasta format

--seqfn <seqfn>

sequences in the Fasta format

--outfn <outfn>

output aln file (default: alnfn .fasta -> _out.fasta)

random_assignment_of_nucleotides.py - Random assignment of nucleotides for non-typical characters in the sequence alignment (arg –alignfn or fasta file with sequneces (arg –seqfn)

R = G A (purine)
Y = U C (pyrimidine)
K = G U (keto)
M = A C (amino)
S = G C (strong bonds)
W = A U (weak bonds)
B = G U C (all but A)
D = G A U (all but C)
H = A C U (all but G)
V = G C A (all but T)
N = A G C U (any)

author: A. Zyla - azyla

Warning

Tested only on fasta files! and requires Biopython (tested with v1.68)

rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.get_align(alignfn)[source]
Get seq from an alignment with gaps.
Args:

alignfn (str): a path to an alignment

Usage::
>>> get_align('test_data/aln1.fasta')
SingleLetterAlphabet() alignment with 2 rows and 13 columns
AGGGGGACAGNYU 1
CYGA------CGG 2
obj1’, description=’obj1’, dbxrefs=[]), id=’<unknown id>’, name=’<unknown name>’, description=’<unknown description>’, dbxrefs=[])
Returns:

alignment

rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.get_parser()[source]
rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.get_sequences(seqfn)[source]

Get seq from an fasta file. :param seqfn: a path to a fasta file :type seqfn: str

Usage::
>>> get_align('test_data/fasta.fasta')
Returns:

[SeqRecord(seq=Seq(‘GGGYYGCCNRW’, SingleLetterAlphabet()), id=’1’, name=’1’, description=’1’, dbxrefs=[]), SeqRecord(seq=Seq(‘GGRGYYGCCUURWAA’, SingleLetterAlphabet()), id=’1’, name=’1’, description=’1’, dbxrefs=[])]

rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.write_align(align, outfn)[source]

Write cleaned alignment with Biopython. :param align: a cleaned alignment :type align: obj :param outfn: a path to a new alignment file :type outfn: str

Returns:

writes to a file in fasta format

Return type:

none

rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.write_seq(seqfn, outfn)[source]

Write cleaned alignment with Biopython. :param align: a cleaned alignment :type align: obj :param outfn: a path to a new alignment file :type outfn: str

Returns:

writes to a file in fasta format

Return type:

none

CMAlign

class rna_tools.tools.rna_alignment.rna_alignment.CMAlign(outputfn=None)[source]

CMAalign class around cmalign (of Inferal).

cmalign - aligns the RNA sequences in <seqfile> to the covariance model (CM) in <cmfile>. The new alignment is output to stdout in Stockholm format.

Example:

cma = ra.CMAlign()
cma.run_cmalign("ade_seq.fa", "RF00167.cm")
seq = cma.get_seq()
print 'cma hit  ', seq
print 'seq      ', a.align_seq(seq)
print 'a.rf     ', a.rf

cmd cmalign -g RF00167.cm ade_seq.fa

# STOCKHOLM 1.0
#=GF AU Infernal 1.1.2

ade          ----------------CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAG-CCUUAAA-CUCUUGAUUAUGAAGUGA------------
#=GR ade PP  ................99*********************************************.*******.***************999............
#=GC SS_cons :::::::::::::::::((((((((,,,<<<<<<<_______>>>>>>>,,,,,,,,<<<<<<<_______>>>>>>>,,))))))))::::::::::::::
#=GC RF      aaaaaauaaaaaaaauucccuCgUAUAAucccgggAAUAUGGcccgggaGUUUCUACCaggcagCCGUAAAcugccuGACUAcGagggaaauuuuuuuuuuu
//
cma hit   ----------------CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAG-CCUUAAA-CUCUUGAUUAUGAAGUGA------------
seq       ----------------CGCU-U-CAUAUAAUCCUAAUGAUAUGG-UUUGGGA-GUUUCUACCAAGAG-CC--UUAAA-CUCUU---GAUUAUG-AAGUGA-------------
a.rf      aaaaaauaaaaaaaauuccc.u.CgUAUAAucccgggAAUAUGG.cccggga.GUUUCUACCaggcagCC..GUAAAcugccu...GACUAcG.agggaaauuuuuuuuuuu.

Install http://eddylab.org/infernal/

Cite: Nawrocki and S. R. Eddy, Infernal 1.1: 100-fold faster RNA homology searches, Bioinformatics 29:2933-2935 (2013).

get_gc_rf()[source]

Get #=GC RF.

Variables:

self.output – string

get_seq()[source]
Variables:

self.output – output of cmalign, string

run_cmalign(seq, cm, verbose=True)[source]

Run cmalign and process the result.

Parameters:
  • seq – seq string

  • cm – cm fn

Run:

$ cmalign RF01831.cm 4lvv.seq
# STOCKHOLM 1.0
#=GF AU Infernal 1.1.2

4lvv         -GGAGAGUA-GAUGAUUCGCGUUAAGUGUGUGUGA-AUGGGAUGUCG-UCACACAACGAAGC---GAGA---GCGCGGUGAAUCAUU-GCAUCCGCUCCA
#=GR 4lvv PP .********.******************9999998.***********.8999999******8...5555...8**************.************
#=GC SS_cons (((((----(((((((((((,,,,,<<-<<<<<<<<___________>>>>>>>>>>,,,<<<<______>>>>,,,)))))))))))-------)))))
#=GC RF      ggcaGAGUAGggugccgugcGUuAAGUGccggcgggAcGGGgaGUUGcccgccggACGAAgggcaaaauugcccGCGguacggcaccCGCAUcCgCugcc
//

Warning

requires cmalign to be set in your shell

RChie

class rna_tools.tools.rna_alignment.rna_alignment.RChie[source]

RChie - plotting arc diagrams of RNA secondary structures.

_images/rchie.png

http://www.e-rna.org/r-chie/

The offline version of R-chie, which requires first installing R4RNA is available here, or clone our git repository here How to install it:

  • Ensure R is installed already, or download it freely from http://www.r-project.org/

  • Download the R4RNA (https://github.com/jujubix/r-chie), open R and install the package:

    install.packages("<path_to_file>/R4RNA", repos = NULL, type="source")
    # Install the optparse and RColorBrewer
    install.packages('optparse')
    install.packages('RColorBrewer')
    
  • Go to rna_tools/rna_tools_config_local.py and set RCHIE_PATH to the folder with RChie, e.g. "/home/magnus/work/opt/r-chie/".

To test if Rchie works on your machine (from rna_align folder):

<path to your rchie>/rchie.R --msafile test_data/rchie_test_files/fasta.txt test_data/rchie_test_files/helix.txt

you should have rchie.png file in the folder.

More at http://www.e-rna.org/r-chie/download.cgi

Cite: Daniel Lai, Jeff R. Proctor, Jing Yun A. Zhu, and Irmtraud M. Meyer (2012) R-chie: a web server and R package for visualizing RNA secondary structures. Nucleic Acids Research, first published online March 19, 2012. doi:10.1093/nar/gks241

plot_cov(seqs, ss_cons, plot_fn='rchie.png', verbose=False)[source]

Plot an RChie plot_conv.

Parameters:
  • seqs – seqs in the fasta format

  • ss_cons – a string of secondary structure consensus, use only ().. Works with pseuoknots.

show()[source]
write(outfn)[source]

Renumber a pdb file according to alignment

renum_to_aln.py

renum_pdb_to_aln.py - renumber a pdb file based on the alignment.

author: A. Zyla under supervision of mmagnus

Warning

works only for single chain! and requires Biopython (tested with v1.68)

usage: renum_to_aln.py [-h] [-v] [--residue_index_start RESIDUE_INDEX_START]
                       [--outfn OUTFN]
                       seqid alignfn pdbfn
seqid

seq id in the alignemnt

alignfn

alignemnt in the Fasta format

pdbfn

pdb file

-h, --help

show this help message and exit

-v, --verbose

increase output verbosity

--residue_index_start <residue_index_start>

renumber starting number (default: 1)

--outfn <outfn>

output pdb file (default: pdbfn .pdb -> _out.pdb)

renum_pdb_to_aln.py - renumber a pdb file based on the alignment.

author: A. Zyla under supervision of mmagnus

Warning

works only for single chain! and requires Biopython (tested with v1.68)

rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.get_parser()[source]
rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.get_seq(alignfn, seqid)[source]

Get seq from an alignment with gaps.

Parameters:
  • alignfn (str) – a path to an alignment

  • seqid (str) – seq id in an alignment

Usage:

>>> get_seq('test_data/ALN_OBJ1_OBJ2.fa', 'obj1')
SeqRecord(seq=SeqRecord(seq=Seq('GUUCAG-------------------UGAC-', SingleLetterAlphabet()), id='obj1', name='obj1', description='obj1', dbxrefs=[]), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])
Returns:

SeqRecord

rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.open_pdb(pdbfn)[source]

Open pdb with Biopython.

Parameters:

pdbfn (str) – a path to a pdb structure

Returns:

with a pdb structure

Return type:

PDB Biopython object

rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.renumber(seq_with_gaps, struc, residue_index_start)[source]

Renumber a pdb file.

Parameters:
  • seq_with_gaps (str) – a target sequence extracted from the alignment

  • struc (pdb) – a structure

  • residue_index_start (int) – starting number

Returns:

BioPython Structure object

rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.write_struc(struc, outfn)[source]

Write renumbered pdb with Biopython.

Parameters:
  • struc (pdb) – a renumbered structure

  • outfn (str) – a path to a new, renumbered pdb file

Returns:

writes to a file

Return type:

none

Root Mean Square Deviation (RMSD)

rna_calc_rmsd

rna_calc_rmsd

rna_calc_rmsd.py - calculate RMSDs of structures to the target

If you still have problem with various number of atoms, check out this issue: get_rnapuzzle_ready: how to deal with `Alternate location indicator (https://github.com/mmagnus/rna-pdb-tools/issues/30).

The program is using (https://github.com/charnley/rmsd).

Example #1:

$ rna_calc_rmsd.py -t 6_0_solution_4GXY_rpr.pdb --model-selection=A:1-17+24-110+115-168 *.pdb
rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
method: all-atom-built-in
# of models: 35
6_0_solution_4GXY_rpr.pdb 0.0 3409
6_Blanchet_1_rpr.pdb 22.31 3409
6_Blanchet_2_rpr.pdb 21.76 3409
6_Blanchet_3_rpr.pdb 21.32 3409
6_Blanchet_4_rpr.pdb 22.22 3409
6_Blanchet_5_rpr.pdb 24.17 3409
6_Blanchet_6_rpr.pdb 23.28 3409
6_Blanchet_7_rpr.pdb 22.26 3409
6_Bujnicki_1_rpr.pdb 36.95 3409
6_Bujnicki_2_rpr.pdb 30.9 3409
6_Bujnicki_3_rpr.pdb 32.1 3409
6_Bujnicki_4_rpr.pdb 32.04 3409
...

Example #2:

time rmsd_calc_to_target.py
  -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
  --target-selection A:1-48+52-63
  --model-selection A:1-48+52-63
  --target-ignore-selection A/57/O2\'
  clusters/*_AA.pdb

rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
  target_selection:  A:1-48+52-63
  model_selection:   A:1-48+52-63
  target_ignore_selection:  A/57/O2'
  model_ignore_selection:
# of models: 801
fn,rmsd_all
pistol_thrs0.50A_clust01-000001_AA.pdb,7.596
pistol_thrs0.50A_clust02-000001_AA.pdb,7.766
pistol_thrs0.50A_clust03-000001_AA.pdb,18.171
[..]
pistol_thrs0.50A_clust799-000001_AA.pdb,5.356
pistol_thrs0.50A_clust800-000001_AA.pdb,15.282
pistol_thrs0.50A_clust801-000001_AA.pdb,16.339
# of atoms used: 1237
csv was created!  rmsds.csv
rmsd_calc_to_target.py -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
37.93s user 1.07s system 87% cpu 44.650 total

Works also for multiple chains:

rna_calc_rmsd.py –model-selection=’A:52+53+59+60+61+80+B:21+22+23’ –target-selection=’A:52+53+59+60+61+80+B:21+22+23’ -t yC_5LJ3_U2U6_core_mdrFx_onlyTriplex_rpr.pdb yC_5LJ3_U2U6_core_mdrFx_addh_MD_1_rpr_rchain.pdb

usage: rna_calc_rmsd [-h] -t TARGET_FN [--ignore-files IGNORE_FILES]
                     [--target-selection TARGET_SELECTION]
                     [--target-ignore-selection TARGET_IGNORE_SELECTION]
                     [--model-selection MODEL_SELECTION]
                     [--model-ignore-selection MODEL_IGNORE_SELECTION]
                     [-m METHOD] [-o RMSDS_FN] [-v] [-pr] [-sr] [-pp]
                     [--way WAY] [--name-rmsd-column NAME_RMSD_COLUMN]
                     [--target-column-name]
                     files [files ...]
files

files

-h, --help

show this help message and exit

-t <target_fn>, --target-fn <target_fn>

pdb file

--ignore-files <ignore_files>

files to be ingored, .e.g, ‘solution’

--target-selection <target_selection>

selection, e.g. A:10-16+20, where #16 residue is included

--target-ignore-selection <target_ignore_selection>

A/10/O2’

--model-selection <model_selection>

selection, e.g. A:10-16+20, where #16 residue is included

--model-ignore-selection <model_ignore_selection>

A/10/O2’

-m <method>, --method <method>

align, fit

-o <rmsds_fn>, --rmsds-fn <rmsds_fn>

ouput, matrix

-v, --verbose

verbose

-pr, --print-results
-sr, --sort-results
-pp, --print-progress
--way <way>

R|c1p = C1’ backbone = P OP1 OP2 O5’ C5’ C4’ C3’ O3’ po = P OP1 OP2 no-backbone = all - po bases, backbone+sugar, sugar pooo = P OP1 OP2 O5’ alpha = P OP1 OP2 O5’ C5’

--name-rmsd-column <name_rmsd_column>

default: fn,rmsd, with this cols will be fn,<name-rmsd-column>

--target-column-name

rna_calc_rmsd.py - calculate RMSDs of structures to the target

If you still have problem with various number of atoms, check out this issue: get_rnapuzzle_ready: how to deal with `Alternate location indicator (https://github.com/mmagnus/rna-pdb-tools/issues/30).

The program is using (https://github.com/charnley/rmsd).

Example #1:

$ rna_calc_rmsd.py -t 6_0_solution_4GXY_rpr.pdb --model-selection=A:1-17+24-110+115-168 *.pdb
rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
method: all-atom-built-in
# of models: 35
6_0_solution_4GXY_rpr.pdb 0.0 3409
6_Blanchet_1_rpr.pdb 22.31 3409
6_Blanchet_2_rpr.pdb 21.76 3409
6_Blanchet_3_rpr.pdb 21.32 3409
6_Blanchet_4_rpr.pdb 22.22 3409
6_Blanchet_5_rpr.pdb 24.17 3409
6_Blanchet_6_rpr.pdb 23.28 3409
6_Blanchet_7_rpr.pdb 22.26 3409
6_Bujnicki_1_rpr.pdb 36.95 3409
6_Bujnicki_2_rpr.pdb 30.9 3409
6_Bujnicki_3_rpr.pdb 32.1 3409
6_Bujnicki_4_rpr.pdb 32.04 3409
...

Example #2:

time rmsd_calc_to_target.py
  -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
  --target-selection A:1-48+52-63
  --model-selection A:1-48+52-63
  --target-ignore-selection A/57/O2\'
  clusters/*_AA.pdb

rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
  target_selection:  A:1-48+52-63
  model_selection:   A:1-48+52-63
  target_ignore_selection:  A/57/O2'
  model_ignore_selection:
# of models: 801
fn,rmsd_all
pistol_thrs0.50A_clust01-000001_AA.pdb,7.596
pistol_thrs0.50A_clust02-000001_AA.pdb,7.766
pistol_thrs0.50A_clust03-000001_AA.pdb,18.171
[..]
pistol_thrs0.50A_clust799-000001_AA.pdb,5.356
pistol_thrs0.50A_clust800-000001_AA.pdb,15.282
pistol_thrs0.50A_clust801-000001_AA.pdb,16.339
# of atoms used: 1237
csv was created!  rmsds.csv
rmsd_calc_to_target.py -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
37.93s user 1.07s system 87% cpu 44.650 total

Works also for multiple chains:

rna_calc_rmsd.py –model-selection=’A:52+53+59+60+61+80+B:21+22+23’ –target-selection=’A:52+53+59+60+61+80+B:21+22+23’ -t yC_5LJ3_U2U6_core_mdrFx_onlyTriplex_rpr.pdb yC_5LJ3_U2U6_core_mdrFx_addh_MD_1_rpr_rchain.pdb

rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd(a, b, target_selection, target_ignore_selection, model_selection, model_ignore_selection, way, verbose)[source]

Calculate RMSD between two XYZ files

by: Jimmy Charnley Kromann <jimmy@charnley.dk> and Lars Andersen Bratholm <larsbratholm@gmail.com> project: https://github.com/charnley/rmsd license: https://github.com/charnley/rmsd/blob/master/LICENSE

a is model b is target

Params:

a = filename of structure a

Params:

b = filename of structure b

Returns:

rmsd, number of atoms

rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd_pymol(pdb1, pdb2, method)[source]

Calculate rmsd using PyMOL. Two methods are available: align and fit

See:

Align can return a list with 7 items:

  • RMSD after refinement

  • Number of aligned atoms after refinement

  • Number of refinement cycles

  • RMSD before refinement

  • Number of aligned atoms before refinement

  • Raw alignment score

  • Number of residues aligned

in this version of function, the function returns RMSD before refinement.

Install on OSX: brew install brewsci/bio/pymol or get

If you have a problem:

Match-Error: unable to open matrix file '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/data/pymol/matrices/BLOSUM62'.

then find BLOSUM62, e.g.:

mdfind -name BLOSUM62 | grep pymol
/Users/magnus/miniconda2/envs/py37/lib/python3.7/site-packages/pymol/pymol_path/data/pymol/matrices/BLOSUM62
/usr/local/Cellar/pymol/2.4.0_3/libexec/lib/python3.9/site-packages/pymol/pymol_path/data/pymol/matrices/BLOSUM62
/Users/magnus/miniconda2/pkgs/pymol-2.4.2-py37h06d7bae_0/share/pymol/data/pymol/matrices/BLOSUM62
/Users/magnus/work/opt/pymol-open-source/data/pymol/matrices/BLOSUM62

and then define PYMOL_DATA in your .bashrc/.zshrc, e.g.:

export PYMOL_DATA="/Users/magnus/work/opt/pymol-open-source/data/pymol"
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.get_parser()[source]
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.get_rna_models_from_dir(files)[source]
Parameters:

models – a list of filenames

Example of the list:

['test_data/rp17/2_restr1_Michal1.pdb_clean.pdb', 'test_data/rp17/2a_nonrestr2_Michal1.pdb_clean.pdb',
'test_data/rp17/3_nonrestr1_Michal1.pdb_clean.pdb', 'test_data/rp17/5_restr1_Michal3.pdb_clean.pdb']
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.sort_nicely(l)[source]

Sort the given list in the way that humans expect.

http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

rna_calc_rmsd_multi_targets.py

rna_calc_rmsd_multi_targets.py - calculate RMSDs of structures to multiple targets:

$ rna_calc_rmsd_multi_targets.py --models multi-targets/rp21/*.pdb                                     --targets multi-targets/rp21/solutions/*.pdb                                     --target-selection A:1-27+29-41                                     --model-selection A:1-27+29-41 

CSV table produced:

                          21_solution_0_ChainA.pdb  21_solution_0_ChainB.pdb  21_solution_1_ChainA.pdb  21_solution_1_ChainB.pdb  21_solution_2.pdb   mean    min    max    sd
fn
21_3dRNA_1_rpr.pdb                           12.17                     12.11                     12.17                     12.11              12.11  12.13  12.11  12.17  0.03
21_Adamiak_1_rpr.pdb                          4.64                      4.61                      4.64                      4.61               4.64   4.63   4.61   4.64  0.01
21_ChenHighLig_1_rpr.pdb                      4.01                      3.97                      4.01                      3.97               4.07   4.01   3.97   4.07  0.04
21_Das_1_rpr.pdb                              5.71                      5.60                      5.71                      5.60               5.61   5.65   5.60   5.71  0.05
--------------------------------------------------------------------------------
Save rna_calc_rmsd_multi_targets_output.csv
usage: rna_calc_rmsd_multi_targets.py [-h] [-v] [--models MODELS [MODELS ...]]
                                      [--targets TARGETS [TARGETS ...]]
                                      [--output-csv OUTPUT_CSV]
                                      [--model-selection MODEL_SELECTION]
                                      [--target-selection TARGET_SELECTION]
-h, --help

show this help message and exit

-v, --verbose

be verbose

--models <models>
--targets <targets>
--output-csv <output_csv>
--model-selection <model_selection>

selection, e.g. A:10-16+20, where #16 residue is included

--target-selection <target_selection>

selection, e.g. A:10-16+20, where #16 residue is included

rna_calc_rmsd_trafl

rna_calc_evo_rmsd

rmsd_calc_trafl - calculate RMSD of transition A->B based on a SimRNA trajectory

After this script, run:

rna_cal_rmsd_trafl_plot.py rmsd.txt

to get a plot like this:

_images/rmsd_transition.png

Prepare structures:

$ SimRNA -p 17_Das_2_rpr.pdb -n 0 -o 17_Das_2_rpr_n0 # no trafl, trafl will be added
$ SimRNA -p 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped.pdb -n 0 -o 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped
#(struc must be (~CG~) nope. It has to be a trajectory!)

and run:

 $ rmsd_calc_trafl.py 17_Das_2_rpr.pdb.trafl 17_Das_2_rpr_n0.trafl 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl rp17_rmsd.txt
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl 17_Das_2_rpr.pdb_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e
 > struc: 17_Das_2_rpr_n0.trafl 2
 > trafl: 17_Das_2_rpr.pdb.trafl 48
 % saved: 17_Das_2_rpr.pdb.trafl_17_Das_2_rpr_n0.trafl
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl_17_Das_2_rpr_n0.trafl 17_Das_2_rpr.pdb_rmsd_e_17_Das_2_rpr_n0_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e_17_Das_2_rpr_n0_rmsd_e
 > struc: 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl 2
 > trafl: 17_Das_2_rpr.pdb.trafl 48
 % saved: 17_Das_2_rpr.pdb.trafl_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl 17_Das_2_rpr.pdb_rmsd_e_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0_rmsd_e
  0.000 -695.634
  0.000 -551.093
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 < out: rp17_rmsd.txt

Warning

calc_rmsd_to_1st_frame (SimRNA) is required and the path to the binary file is defined in config_local.

usage: rna_calc_evo_rmsd [-h] trafl struc1 struc2 rmsds_fn
trafl

trafil

struc1

structure A

struc2

structure B

rmsds_fn

output file

-h, --help

show this help message and exit

rna_cal_rmsd_trafl_plot

rna_cal_rmsd_trafl_plot - generate a plot based of <rmsd.txt> of rna_calc_evo_rmsd.py.

usage: rna_cal_rmsd_trafl_plot [-h] file
file

rmsd.txt

-h, --help

show this help message and exit

rna_calc_rmsd_all_vs_all

rna_calc_rmsd_all_vs_all

rna_calc_rmsd_all_vs_all.py - calculate RMSDs all vs all and save it to a matrix

Examples:

rna_calc_rmsd_all_vs_all.py -i test_data -o test_output/rmsd_calc_dir.tsv
 # of models: 4
... 1 test_data/struc1.pdb
... 2 test_data/struc2.pdb
... 3 test_data/struc3.pdb
... 4 test_data/struc4.pdb

The program is using (https://github.com/charnley/rmsd).

You can also use PyMOL to do align or fit:

python rna_calc_rmsd_all_vs_all.py -i test_data -o test_output/rmsd_calc_dir_align.mat -m align
 # of models: 5
# test_data/2nd_triplex_FB_1AUA3_rpr.pdb test_data/struc1.pdb test_data/struc2.pdb test_data/struc3.pdb test_data/struc4.pdb
0.0 4.13 4.922 4.358 4.368
4.13 0.0 11.092 4.707 3.46
4.922 11.092 0.0 11.609 11.785
4.358 4.707 11.609 0.0 2.759
4.368 3.46 11.785 2.759 0.0
matrix was created!  test
usage: rna_calc_rmsd_all_vs_all [-h] [-i INPUT_DIR] [-o MATRIX_FN] [-m METHOD]
-h, --help

show this help message and exit

-i <input_dir>, --input-dir <input_dir>

input folder with structures

-o <matrix_fn>, --matrix-fn <matrix_fn>

ouput, matrix

-m <method>, --method <method>

all-atom, pymol: align, fit

rna_calc_rmsd_all_vs_all.py - calculate RMSDs all vs all and save it to a matrix

Examples:

rna_calc_rmsd_all_vs_all.py -i test_data -o test_output/rmsd_calc_dir.tsv
 # of models: 4
... 1 test_data/struc1.pdb
... 2 test_data/struc2.pdb
... 3 test_data/struc3.pdb
... 4 test_data/struc4.pdb

The program is using (https://github.com/charnley/rmsd).

You can also use PyMOL to do align or fit:

python rna_calc_rmsd_all_vs_all.py -i test_data -o test_output/rmsd_calc_dir_align.mat -m align
 # of models: 5
# test_data/2nd_triplex_FB_1AUA3_rpr.pdb test_data/struc1.pdb test_data/struc2.pdb test_data/struc3.pdb test_data/struc4.pdb
0.0 4.13 4.922 4.358 4.368
4.13 0.0 11.092 4.707 3.46
4.922 11.092 0.0 11.609 11.785
4.358 4.707 11.609 0.0 2.759
4.368 3.46 11.785 2.759 0.0
matrix was created!  test
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.calc_rmsd(a, b)[source]

Calc rmsd.

rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.get_parser()[source]
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.get_rna_models_from_dir(directory)[source]
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.sort_nicely(l)[source]

Sort the given list in the way that humans expect. http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

Interaction Network Fidelity (INF)

rna_calc_inf.py

usage: rna_calc_inf.py [-h] [-t TARGET_FN] [-m NT]
                       [--ignore-files IGNORE_FILES] [-s SS] [--no-stacking]
                       [--debug] [--web] [-pr] [-sr] [--method METHOD]
                       [--target-selection TARGET_SELECTION]
                       [--model-selection MODEL_SELECTION]
                       [--renumber-residues] [--dont-remove-sel-files] [-f]
                       [-v] [-o OUT_FN]
                       files [files ...]
files

files, .e.g folder_with_pdbs/*pdbs

-h, --help

show this help message and exit

-t <target_fn>, --target_fn <target_fn>

pdb file

-m <nt>, --number-of-threads <nt>

number of threads used for multiprocessing, if 1 then mp is not used (useful for debugging)!

--ignore-files <ignore_files>

files to be ingored, .e.g, ‘solution’

-s <ss>, --ss <ss>

A:(([[))]], works only for single chain (the chain is A by default)

--no-stacking

default: use stacking, if this option on, don’t take into account stacking, WARNING/BUG: inf_all will be incorrectly calculated if stacking is off

--debug
--web
-pr, --print-results
-sr, --sort-results
--method <method>

you can use mcannotate* or clarna (right now only clarna is tested)

--target-selection <target_selection>

selection, e.g. A:10-16+20, where #16 residue is included

--model-selection <model_selection>

selection, e.g. A:10-16+20, where #16 residue is included

--renumber-residues

renumber residues from 1 to X for comparison with selection

--dont-remove-sel-files

don’t remove temp files created based on target|model-selectionforce

-f, --force

force to run ClaRNA even if <pdb>.outCR file is there, for will be auto True when selection defined

-v, --verbose

be verbose, tell me more what’re doing

-o <out_fn>, --out_fn <out_fn>

out csv file, be default inf.csv

A tool to calc inf_all, inf_stack, inf_WC, inf_nWC, SNS_WC, PPV_WC, SNS_nWC, PPV_nWC between two structures.

Mind, that ClaRNA is pretty slow, it takes even a few seconds to analyze a structure, so for, say, 1000 models you need a few hours.

How to make it faster?

First, you can use --number-of-threads to specify the number of cores used for multiprocessing.

Second, the procedure implemented in here is composed of two steps, first for each structure ClaRNA is used to generate an output with contacts, then these files are used for comparisons. So, if you want to re-run your analysis, you don’t have to re-run ClaRNA itself. Thus, be default ClaRNA is not executed if <model>.outCR is found next to the analyzed files. To change this behavior, force (--force) rna_calc_inf.py to re-run ClaRNA.

rna_tools.tools.rna_calc_inf.rna_calc_inf.do_job(l)[source]

Run ClaRNA & Compare, add 1 to the counter, write output to csv file (keeping it locked)

rna_tools.tools.rna_calc_inf.rna_calc_inf.get_parser()[source]

RNA filter (DCA)

Calculate distances based on given restrants on PDB files or SimRNA trajectories

rna_filter.py

rna_filter.py - calculate distances based on given restrants on PDB files or SimRNA trajectories.

Changes: weight is always 1 (at least for now). ,>,=,>=,<= .

[PREVIOUS DOCUMENTATION - TO BE REMOVED]

rna_filter.py -s 4gxy_rpr.pdb -r rp06_MohPairs.rfrestrs d:A5-A42 100.0 measured: 26.7465763417 [x] d:A11-A26 100.0 measured: 19.2863696104 [x]

[mm] rp06$ git:(master) $ rna_filter.py -s 4gxy_rpr.pdb -r rp06_MohPairs.rfrestrs

d:A5-A42 100.0 measured: 26.7465763417 [x] d:A11-A26 100.0 measured: 19.2863696104 [x]

Traceback (most recent call last):
File “/home/magnus/work-src/rna-pdb-tools/bin/rna_filter.py”, line 270, in <module>

calc_scores_for_pdbs(args.structures, restraints, args.verbose)

File “/home/magnus/work-src/rna-pdb-tools/bin/rna_filter.py”, line 221, in calc_scores_for_pdbs

dist = get_distance(residues[h[0]][‘mb’], residues[h[1]][‘mb’])

KeyError: ‘A24’

correct, there is no A24 in this structure:

The format of restraints:

(d:A1-A2 < 10.0 1) = if distance between A1 and A2 lower than 10.0, score it with 1

Usage:

$ python rna_filter.py -r test_data/restraints.txt -s test_data/CG.pdb
 d:A1-A2 10.0 measured: 6.58677550096 [x]
test_data/CG.pdb 1.0 1 out of 1

# $ python rna_filter.py -r test_data/restraints.txt -t test_data/CG.trafl
(d:A1-A2 <  10.0  1)|(d:A2-A1 <= 10 1)
 restraints [('A1', 'A2', '<', '10.0', '1'), ('A2', 'A1', '<=', '10', '1')]

Frame #1 e:1252.26
  mb for A1 [ 54.729   28.9375  41.421 ]
  mb for A2 [ 55.3425  35.3605  42.7455]
   d:A1-A2 6.58677550096
  mb for A2 [ 55.3425  35.3605  42.7455]
  mb for A1 [ 54.729   28.9375  41.421 ]
   d:A2-A1 6.58677550096
# this ^ is off right now
usage: rna_filter.py [-h] -r RESTRAINTS_FN [-v]
                     [-s STRUCTURES [STRUCTURES ...]] [--offset OFFSET]
                     [-t TRAJECTORY]
-h, --help

show this help message and exit

-r <restraints_fn>, --restraints_fn <restraints_fn>

restraints_fn: Format: (d:A9-A41 < 10.0 1)|(d:A41-A9 <= 10 1)

-v, --verbose

be verbose

-s <structures>

structures

--offset <offset>

use offset to adjust your restraints to numbering in PDB files, ade (1y26)pdb starts with 13, so offset is -12)

-t <trajectory>

SimRNA trajectory

rna_dca_mapping.py

usage: rna_dca_mapping.py [-h] --seq SEQ --gseq GSEQ --dca DCA
                          [--offset OFFSET] [--noss] [--mss] [--verbose]
                          [--noshort]
-h, --help

show this help message and exit

--seq <seq>

seq fn in Fasta format

--gseq <gseq>

gapped sequence and secondary structure (like in the alignment used for DCA) in Fasta format

--dca <dca>

file with parsed interactions

--offset <offset>

offset

--noss

filter out ss from plot

--mss

ss every each line

--verbose

be verbose

--noshort

filter out short interactions, dist in seq < 6 nt

Show distances in PyMOL

show_dists - show distances in PyMOL

_images/getdists.png

Usage:

PyMOL>show_dists([[1,2]])
1, 2, 3.41

Analyze an evolutionary coupling file. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

rna_ec2x.py

rna_ex2x.py - analyze an evolutionary coupling file.

Files can be downloaded from https://marks.hms.harvard.edu/ev_rna/, e.g. RF00167.EC.interaction.csv

--pairs:

$ rna_ex2x.py RF00167.EC.interaction_LbyN.csv --pairs
[18, 78],[31, 39],[21, 75],[30, 40],[28, 42],[27, 43],[59, 67],[54, 72],[57, 69],[25, 45],[29, 41],[17, 79],[26, 44],[16, 80],[14, 82],[19, 77],[55, 71],[
    15, 81],[34, 63],[56, 70],[58, 68],[35, 63],[26, 45],[35, 64],[32, 39],[54, 73],[24, 74],[16, 82],[24, 45],[24, 43],[32, 36],[25, 48],[48, 82],[36, 48],
usage: rna_ec2x.py [-h] [--sep SEP] [--chain CHAIN] [--ec-pairs]
                   [--ss-pairs SS_PAIRS] [--pairs-delta]
                   interaction_fn
interaction_fn

interaction file

-h, --help

show this help message and exit

--sep <sep>

separator

--chain <chain>

chain

--ec-pairs
--ss-pairs <ss_pairs>

file with secondary structure base pairs

--pairs-delta

delta: ec-bp - ss-paris

Convert pairs to SimRNA restraints

rna_pairs2SimRNArestrs.py

rna_pairs2SimRNArestrs.py - convert pairs to SimRNA restraints

Example:

$ rna_pairs2SimRNArestrs.py rp06_pairs_delta.txt -v
# of pairs: 42
SLOPE A/2/MB A/172/MB 0 6 1
SLOPE A/2/MB A/172/MB 0 7 -1
SLOPE A/3/MB A/169/MB 0 6 1
SLOPE A/3/MB A/169/MB 0 7 -1
SLOPE A/12/MB A/32/MB 0 6 1
usage: rna_pairs2SimRNArestrs.py [-h] [--offset OFFSET] [--weight WEIGHT]
                                 [--dist DIST] [--well] [-v]
                                 pairs
pairs

a file with [[2, 172], [3, 169], [12, 32], [13, 31]]

-h, --help

show this help message and exit

--offset <offset>

can be -10

--weight <weight>

weight

--dist <dist>

distances, for MOHCA use 25

--well

well instead of slope

-v, --verbose

be verbose

Get a list of base pairs for a given “fasta ss” file.

rna_ss_get_bps.py

rna_ss_get_bps.py - get a list of base pairs for a given “fasta ss” file.

Input file:

cat ade_ss.fa
>1y26
CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGUG
(((((((((...((((((.........))))))........((((((.......))))))..)))))))))%

Usage:

$ rna_ss_get_bps.py ade_ss.fa --offset 12
[[13, 83], [14, 82], [15, 81], [16, 80], [17, 79], [18, 78], [19, 77], [20, 76], [21, 75], [25, 45], [26, 44], [27, 43], [28, 42], [29, 41], [30, 40], [54, 72], [55, 71], [56, 70], [57, 69], [58, 68], [59, 67]]

Now it also work with pseudoknots.

usage: rna_ss_get_bps.py [-h] [--offset OFFSET] [-v] file
file

file in the Fasta format

-h, --help

show this help message and exit

--offset <offset>

offset

-v, --verbose

be verbose

Get a diff of pairs

rna_pairs_diff.py

rna_pairs_diff.py - get a diff of pairs

Usage:

$ rna_pairs_diff.py pistol_dca_all.bp pistol.bp
# of ec_paris: 31
# of ssbps   : 18
dalta#       : 13
[[4, 32], [6, 9], [6, 36], [6, 39], [9, 39], [13, 32], [16, 17], [17, 18], [22, 49], [29, 58]]
usage: rna_pairs_diff.py [-h] [-v] pairs1 pairs2
pairs1

a list of pairs, A

pairs2

a list of pairs to subtract, A-B, results in C(all pairs that are in A and are not in B

-h, --help

show this help message and exit

-v, --verbose

be verbose

Contacts classification & secondary structure detection

See also PyMOL4RNA for ways to visualize edges https://rna-tools.readthedocs.io/en/latest/pymol4rna.html

3DNA (contacts classification & secondary structure detection)

Python parser to 3dna <http://x3dna.org/>.

Installation:

# install the code from http://forum.x3dna.org/downloads/3dna-download/
Create a copy of the rna_x3dna_config_local_sample.py (remove "_sample") present in rna-tools/rna_tools/tools/rna_x3dna folder.
Edit this line :
BINARY_PATH = <path to your x3dna-dssr file>
matching the path with the path of your x3dna-dssr file.
e.g. in my case: BINARY_PATH = ~/bin/x3dna-dssr.bin

For one structure you can run this script as:

[mm] py3dna$ git:(master) ✗ ./rna_x3dna.py test_data/1xjr.pdb
test_data/1xjr.pdb
>1xjr nts=47 [1xjr] -- secondary structure derived by DSSR
gGAGUUCACCGAGGCCACGCGGAGUACGAUCGAGGGUACAGUGAAUU
..(((((((...((((.((((.....))..))..))).).)))))))

For multiple structures in the folder, run the script like this:

[mm] py3dna$ git:(master) ✗ ./rna_x3dna.py test_data/*
test_data/1xjr.pdb
>1xjr nts=47 [1xjr] -- secondary structure derived by DSSR
gGAGUUCACCGAGGCCACGCGGAGUACGAUCGAGGGUACAGUGAAUU
..(((((((...((((.((((.....))..))..))).).)))))))
test_data/6TNA.pdb
>6TNA nts=76 [6TNA] -- secondary structure derived by DSSR
GCGGAUUUAgCUCAGuuGGGAGAGCgCCAGAcUgAAgAPcUGGAGgUCcUGUGtPCGaUCCACAGAAUUCGCACCA
(((((((..((((.....[..)))).((((.........)))).....(((((..]....))))))))))))....
test_data/rp2_bujnicki_1_rpr.pdb
>rp2_bujnicki_1_rpr nts=100 [rp2_bujnicki_1_rpr] -- secondary structure derived by DSSR
CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU
[[[[(((.....(((&{{{{))))))&(((((((.....(.(&]]]]).))))&[[[[[[......[[[&))))]]].]]&}}}}(((.....(((&]]]]))))))

Warning

This script should not be used in this given form with Parallel because it process output files from x3dna that are named always in the same way, e.g. dssr-torsions.txt. #TODO

rna_tools.tools.rna_x3dna.rna_x3dna.get_parser()[source]
class rna_tools.tools.rna_x3dna.rna_x3dna.x3DNA(pdbfn, show_log=False)[source]

Atributes:

curr_fn report

clean_up(verbose=False)[source]
get_ion_water_report()[source]

@todo File name: /tmp/tmp0pdNHS

no. of DNA/RNA chains: 0 [] no. of nucleotides: 174 no. of waters: 793 no. of metals: 33 [Na=29, Mg=1, K=3]

get_modifications()[source]

Run find_pair to find modifications.

get_secstruc()[source]

Get secondary structure.

get_seq()[source]

Get sequence.

Somehow 1bzt_1 x3dna UCAGACUUUUAAPCUGA, what is P? P -> u

get_torsions(outfn) str[source]

Get torsion angles into ‘torsion.csv’ file:

nt,id,res,alpha,beta,gamma,delta,epsilon,zeta,e-z,chi,phase-angle,sugar-type,ssZp,Dp,splay,bpseq 1,g,A.GTP1,nan,nan,142.1,89.5,-131.0,-78.3,-53(BI),-178.2(anti),358.6(C2’-exo),~C3’-endo,4.68,4.68,29.98,0 2,G,A.G2,-75.8,-167.0,57.2,79.5,-143.4,-69.7,-74(BI),-169.2(anti),5.8(C3’-endo),~C3’-endo,4.68,4.76,25.61,0

run_x3dna(show_log=False, verbose=False)[source]
exception rna_tools.tools.rna_x3dna.rna_x3dna.x3DNAMissingFile[source]

ClaRNA (contacts classification)

If you want to calculate “Interaction Network Fidelity (INF) and not only” see also rna_calc_inf

To run ClaRNA, see the documentaton below:

Usage:

$ rna_clarna_app.py ../../input/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb             ../../input/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
((((([[[[[[)))))........(.((....(]]]]]].)..(((......)))...)).)

Example

from rna_tools.utils.clarna_app import clarna_app
if __name__ == '__main__':
    ss = '((((.[[[[[[.))))........((((.....]]]]]]...(((((....)))))..))))'
    fnCRref = clarna_app.get_ClaRNA_output_from_dot_bracket(ss)
    f = '../rna_calc_rmsd/test_data/pistol/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb'
    fnCR = clarna_app.clarna_run(f, force=False)
    results = clarna_app.clarna_compare(fnCRref, fnCR)
    print results #
    #tmp_Z42i_..pdb.outCR     5k7c_clean_onechain_renumber_as_puzzle_srr.pdb.outCR      0.706      NA         0.865      NA         0.842      0.889      NA         0.000
rna_tools.tools.clarna_app.rna_clarna_app.clarna_compare(target_cl_fn, i_cl_fn, verbose=False)[source]

Run ClaRNA compare (from rna_tools/tools/clarna_play).

Returns:

a list target, fn, scores

Scores:

inf_all      0.706
inf_stack -999.999 -> NA
inf_WC       0.865
inf_nWC   -999.999 -> NA
SNS_WC       0.842
PPV_WC       0.889
SNS_nWC     NA
PPV_nWC      0.000

Example of the list:

5k7c_clean_onechain_renumber_as_puzzle_srr.pdb     pistol_thrs0.50A_clust01-000001_AA.pdb
 0.642      NA         0.874      0.000      0.944      0.810      0.000      0.000s

use results.split()[4] to get inf_WC

rna_tools.tools.clarna_app.rna_clarna_app.clarna_run(fn, force=True, stacking=True, verbose=False)[source]

Run ClaRNA run

Parameters:

fn (str) – filename to analyze

Returns:

a filename to ClaRNA output (fn + ‘.outCR’)

Return type:

str

rna_tools.tools.clarna_app.rna_clarna_app.get_ClaRNA_output_from_dot_bracket(ss, temp=True, verbose=False)[source]

Get dummy ClaRNA output out of dat bracket secondary structure (ss)

Parameters:

ss (string) – secondary structure

Returns:

a filename to ClaRNA output

rna_tools.tools.clarna_app.rna_clarna_app.get_dot_bracket_from_ClaRNAoutput(inCR, verbose=False)[source]

In inCR file

Warning

This function requres ‘ClaRNAwd_to_vienaSS’ contact marcin.magnus@icloud.com

rna_tools.tools.clarna_app.rna_clarna_app.get_parser()[source]

RNA 3D model quality assessment

Wrappers behind the server, http://genesilico.pl/mqapRNA (in short, mq)

RASP

This module contains functions for computing RASP potential

class rna_tools.tools.mq.RASP.RASP.RASP(job_id=None)[source]

Wrapper class for running RASP automatically.

colour_by_local_score(path_to_pdb, potential_type='all')[source]
executable = ['rasp_fd', 'rasp_profile_fd']
program_name = ['rasp_fd', 'rasp_profile_fd']
run(path_to_pdb, global_energy_score=True, potentials=['c3', 'bb', 'bbr', 'all'], handler=True, verbose=False)[source]

Compute RASP potential for a single file

rasp_fd generates:

output_all.txt:

-9869.61   66447       -0.148534       0       0       0

rasp_profile_fd generates:

profile_all.txt:

C           1       R       -775.038
C           2       R       -1164.22
U           3       R       -2054.17
G           4       R       -1601.13
[..]
Input:
  • path_to_pdb = path to PDB file

  • potential_type = all, bbr or c3

  • global_energy_score = True/False (See Output), default=True

  • hander = True (if you use PYRO)/FALSE otherwise

Output:
  • the output depends on global_enery_score value T/F You might get:

    -9869.61

    profile [('C', 1, 'R', -775.03800000000001), ('C', 2, 'R', -1164.22) ..

Dfire

This module contains functions for computing Dfire potential

Installation:

git clone https://github.com/tcgriffith/dfire_rna.git make # add DFIRE_RNA_HOME to .bashrc

class rna_tools.tools.mq.Dfire.Dfire.Dfire[source]

Wrapper class for Dfire.

run(path_to_pdb, verbose=False)[source]

Run program with previously set flags. This method should work if program wants some command some options and nothing more, otherwise you should override it.

rna_tools.tools.mq.Dfire.Dfire.get_parser()[source]
rna_tools.tools.mq.Dfire.Dfire.test(verbose)[source]

RNAkb

This module contains functions for computing RNAkb potential

It seems that this is impossible to run RNAkb in full atom mode. So this works only in 5 pt (5 points/atom per residue) mode.

https://gromacs.bioexcel.eu/t/fatal-error-an-input-file-contains-a-line-longer-than-4095-characters/1397/2

class rna_tools.tools.mq.RNAkb.RNAkb.RNAkb(job_id='', sandbox=False)[source]

Wrapper class for running RNAkb automatically.

executable = ['/usr/local/gromacs/bin/pdb2gmx', '/usr/local/gromacs/bin/make_ndx', '/usr/local/gromacs/bin/editconf', '/usr/local/gromacs/bin/grompp', '/usr/local/gromacs/bin/mdrun']
log_stdout_stderr()[source]
run(name, potential_type, verbose=False)[source]

Compute RNAkb potential for a single file

Parameters:
  • path (str) – name of a PDB file

  • potential_type (str) – ‘5pt’ for 5 point or ‘aa’ for all atom aa is off

Returns:

a list of energies (strings) [‘2.57116e+05’, ‘1.62131e+03’, ‘7.82459e+02’, ‘3.00789e-01’, ‘0.00000e+00’, ‘0.00000e+00’, ‘-2.51238e-03’, ‘0.00000e+00’, ‘2.59520e+05’, ‘2.54174e-09’, ‘2.59520e+05’]

Warning

‘aa’ does not work because of “a line longer than 4095 characters”

The function parses:

Statistics over 1 steps using 1 frames
Energies (kJ/mol)
Bond          Angle    Proper Dih.  Improper Dih.          LJ-14
2.44111e+05    1.76069e+03    8.12947e+02    1.82656e-01    0.00000e+00
Coulomb-14        LJ (SR)   Coulomb (SR)      Potential    Kinetic En.
0.00000e+00   -1.94624e-03    0.00000e+00    2.46685e+05    2.43227e-09
Total Energy    Temperature Pressure (bar)
2.46685e+05    6.67884e-10   -5.94232e+04
Total Virial

# ['Statistics', 'over', '1', 'steps', 'using', '1', 'frames', 'Energies', '(kJ/mol)',
'Bond', 'Angle', 'Proper', 'Dih.', 'Improper', 'Dih.', 'LJ-14', '2.44111e+05', '1.76069e+03',
'8.12947e+02', '1.82656e-01', '0.00000e+00', 'Coulomb-14', 'LJ', '(SR)', 'Coulomb', '(SR)',
'Potential', 'Kinetic', 'En.', '0.00000e+00', '-1.94624e-03', '0.00000e+00', '2.46685e+05',
'2.43227e-09', 'Total', 'Energy', 'Temperature', 'Pressure', '(bar)', '2.46685e+05', '6.67884e-10',
'-5.94232e+04', 'Total', 'Virial']

result[6] is really the RNAkb

rna_mq_rnakb.py

Standalone tool to run the RNAkb class in the terminal.

All atom mode does not really work, see the documentation of the RNAkb class.

usage: rna_mq_rnakb.py [-h] [-v] [-p POTENTIAL] file [file ...]
file

a PDB file, one or more

-h, --help

show this help message and exit

-v, --verbose

be verbose

-p <potential>, --potential <potential>

5pt

QRNA

eSCORE

This module contains functions for computing eSCORE

Install:

pip install barnaba # tested with barnaba==0.1.7

Output:

$ baRNAba ESCORE --pdb ../test/1a9n.pdb --ff /Users/magnus/work/opt/barnaba/barnaba_201128/test/data/1S72.pdb
# your output will be written to files with prefix outfile.ESCORE
# KDE computed. Bandwidth=  0.25  using 10655 base-pairs# Loaded sample ../test/1a9n.pdb

#     Frame       ESCORE
          0   4.1693e-01

The eSCORE could be also accessed via Python:

from barnaba import escore
Escore = escore.Escore([path_to_pdb])
(..)
# see example_12_escore.ipynb of barnaba package https://github.com/srnas/barnaba

It does not work on M1 mac (problem to compile mdtraj).

class rna_tools.tools.mq.eSCORE.eSCORE.eSCORE[source]

Wrapper class for eSCORE

run(path_to_pdb, verbose=False)[source]

Run program with previously set flags. This method should work if program wants some command some options and nothing more, otherwise you should override it.

rna_tools.tools.mq.eSCORE.eSCORE.exe(cmd)[source]
rna_tools.tools.mq.eSCORE.eSCORE.main()[source]

3dRNAscore

This module contains functions for computing 3dRNAscore

Install:

make # make clean if you don't have clean

At Mac there is a problem (201128, 211103):

(py37) [mx] example$ ../bin/3dRNAscore -s:l score.in >score.txt
[1]    69818 segmentation fault  ../bin/3dRNAscore -s:l score.in > score.txt
class rna_tools.tools.mq.RNAscore.RNAscore.RNAscore(job_id=None)[source]

Wrapper class for running 3dRNAscore

program_name = '/bin/3dRNAscore'
run(path_to_pdb, verbose=1)[source]

Run program with previously set flags. This method should work if program wants some command some options and nothing more, otherwise you should override it.

rna_tools.tools.mq.RNAscore.RNAscore.main()[source]

RNA3DCNN

This module contains functions for computing RNA3DCNN potential

Output:

Trainable params: 4,282,801
Non-trainable params: 0
_________________________________________________________________
Scores for each nucleotide in /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmptg6jy2ud/query.pdb:
[[ 0.02462262]
 [ 0.03271335]
 [ 0.06199259]
 [ 0.02006263]
 [ 0.05937254]
 [ 0.12025979]
 [ 0.20201728]
 [ 0.24463326]
 [ 0.43518737]
 [ 0.7260638 ]
 [ 0.6140108 ]
 [ 0.6588027 ]
 [ 0.7668936 ]
 [ 0.4776191 ]
 [ 0.39859247]
 [ 0.572009  ]
 [ 0.64892375]
 [ 0.11587611]
 [ 0.0560993 ]
 [ 0.05285829]
 [ 0.0167731 ]
 [ 0.01759553]
 [ 0.02143204]
 [-0.01818037]]
Total score for /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmptg6jy2ud/query.pdb is  6.3262305

If missing atoms:

Total params: 4,282,801
Trainable params: 4,282,801
Non-trainable params: 0
_________________________________________________________________
There is no atom O5' in residue 620A in chain  A in PDB /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmpx87uus6x/query.pdb.
There is no atom O5' in residue 635A in chain  B in PDB /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmpx87uus6x/query.pdb.
There is no atom O5' in residue 1750G in chain  C in PDB /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmpx87uus6x/query.pdb.
Scores for each nucleotide in /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmpx87uus6x/query.pdb:
[]
Total score for /var/folders/yc/ssr9692s5fzf7k165grnhpk80000gp/T/tmpx87uus6x/query.pdb is  0.0
class rna_tools.tools.mq.RNA3DCNN.RNA3DCNN.RNA3DCNN[source]

Wrapper class for RNA3DCNN.

max_seq_len = 100000
run(path_to_pdb, verbose=False)[source]

Run program with previously set flags. This method should work if program wants some command some options and nothing more, otherwise you should override it.

rna_tools.tools.mq.RNA3DCNN.RNA3DCNN.main()[source]

FARNA

Wrapper for ROSETTA software for structure prediction of small RNA sequences

class rna_tools.tools.mq.FARNA.FARNA.FARNA(sequence='test', seq_name='test', job_id=None)[source]

Wrapper class for running ROSETTA scoring function automatically.

best_energy = ''
cleanup()[source]

Run this method when you are done with this wrapper. It deletes sandbox, some of it’s childrem may do some other cleanup here.

db_path = ''
executable = 'rna_minimize'
get_result()[source]

Parse and get result from score file created during ROSETTA run

All results are kept in self.result, but only global score is returned

input_file = ''
input_fn = 'seq.fasta'
mqap(pdb)[source]

Total weighted score:s+(?P<ROSETTA_SCORE>[-d.]+)

program_name = 'farna'
run(pdb_file, hires, verbose=False, system=False)[source]

Compute FARNA potential for a single file

Parameters:
  • file (* pdb_file = path to pdb)

  • True/False (* global_energy_score =)

Output:
  • A list of energies, e.g:

    ['-21.721', '-0.899', '-20.961', '-84.498', '-16.574', '-180.939', '11.549', '7.475', '-17.257', '-306.324', '0.0', '0.0', '17.503', '0.0']
    

??? or a dictionary of lists of local scores, eg:

{
'N_BS': [17.0, -0.70039, -0.720981, -0.685238, -0.734146, ... ],
'atom_pair_constraint': [0.0, -0.764688, -0.773833, ...],
...
}
sandbox()[source]

Create a sandbox in temporary folder for running program.

src_bin = ''

ClashScore

class rna_tools.tools.mq.ClashScore.ClashScore.ClashScore[source]

ClashScore: Wrapper class for running phenix.clashscore

cleanup()[source]
run(fn, verbose=False)[source]
Parameters:
  • fn (string) – path to a file

  • verbose (bool) – be verbose

Returns:

clashscore (float)

Example:

/Applications/phenix-1.18.2-3874/build/bin/phenix.clashscore test/1xjrA.pdb
Using electron cloud x-H distances and vdW radii

Adding H/D atoms with reduce...

Bad Clashes >= 0.4 Angstrom:
 A  17    A  N1   A  34    G  N1  :0.430
clashscore = 0.66
test/1xjrA.pdb
0.66
/Applications/phenix-1.18.2-3874/build/bin/phenix.clashscore test/1xjrA_M1.pdb
Using electron cloud x-H distances and vdW radii

Adding H/D atoms with reduce...

Bad Clashes >= 0.4 Angstrom:
 A  41    G  H2'  A  42    U  OP1 :0.738
 A  25    U  H6   A  25    U HO2' :0.659
 A  46    U  H2'  A  47    U  OP2 :0.656
 A  29    A H5''  A  30    U  O2' :0.623
 (...)

  A  24    G  O2'  A  26    A  C8  :0.410
 A  42    U  O2'  A  43    G  H8  :0.409
 A  43    G  H2'  A  44    A  O4' :0.408
 A  28    G  C8   A  28    G  O2' :0.403
clashscore = 15.45
test/1xjrA_M1.pdb
15.45

# so the output is
clashscore = 15.45
rna_tools.tools.mq.ClashScore.ClashScore.test()[source]

AnalyzeGeometry

This module contains functions for computing AnalyzeGeomotery.

class rna_tools.tools.mq.AnalyzeGeometry.AnalyzeGeometry.AnalyzeGeometry(verbose=False)[source]

Wrapper class for running clashscore

Note

Sequence is required to get the length to calculate % of corrent residues.

cleanup()[source]
run(name, verbose=False)[source]
Parameters:
  • name (str) – the path of the file to wrap

  • verbose (boolen) – be verbose

Returns:

score; % of Backbone torsion suites (# of them per seq)

Return type:

float

Output:

----------Backbone torsion suites----------

  Suite ID                 suite  suiteness  triaged angle
     A A   3                  !!      0.000     delta
     G A   4                  !!      0.000  epsilon-1
     G A  11                  !!      0.000      None
     C A  20                  !!      0.000     gamma
     U A  25                  !!      0.000     delta
     A A  26                  !!      0.000   delta-1
     A A  29                  !!      0.000      None
     U A  30                  !!      0.000  epsilon-1
     G A  41                  !!      0.000     delta
     U A  42                  !!      0.000   delta-1
     A A  45                  !!      0.000     delta
     U A  46                  !!      0.000  epsilon-1
     U A  47                  !!      0.000   delta-1
  11 suites triaged and 0 incomplete leaving 35 suites
  13/46 suite outliers present
  Average suiteness: 0.490
13 # count lines after 'traged angle' and minus 3 (# of last lines)
test/1xjrA_M1.pdb
34.7826

Output for a perfect structure:

/Applications/phenix-1.18.2-3874/build/bin/phenix.rna_validate /Users/magnus/work/src/rna-tools/rna_tools/input/mq/5e3hBC.pdb
CGACGCUAGCGUACGCUAGCGUCG
AnalyzeGeomtery:: ----------Backbone bond lenths----------

  All bonds within 4.0 sigma of ideal values.

                    ----------Backbone bond angles----------

  All angles within 4.0 sigma of ideal values.

                        ----------Sugar pucker----------

  All puckers have reasonable geometry.

                  ----------Backbone torsion suites----------

  0 suites triaged and 0 incomplete leaving 24 suites
  All RNA torsion suites are reasonable.
  Average suiteness: 0.766
rna_tools.tools.mq.AnalyzeGeometry.AnalyzeGeometry.main()[source]

RNA 3D structure prediction

ROSETTA

A set of wrappers around Rosetta (https://www.rosettacommons.org/), mostly based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

Run (modeling)

rna_rosetta_run.py

rna_rosetta_run.py - prepare & run ROSETTA simulations

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http: // www.sciencedirect.com / science / article / pii / S0076687914000524

The script makes(1) a folder for you job, with seq.fa, ss.fa, input file is copied as input.fa to the folder(2) make helices(3) prepare rosetta input files(4) sends jobs to the cluster.

The header is take from the fast file(`` > /header / ``) not from the filename of your Fasta file.

I discovered this:

qstat -xml | tr '\n' ' ' | sed 's#<job_list[^>]*>#\n#g' \

> | sed ‘s#<[^>]*>##g’ | grep “ “ | column -t

(https://stackoverflow.com/questions/26104116/qstat-and-long-job-names) so there is now need to shorted my job ids.

Helix

Run:

rna_rosetta_run.py -i -e -r -g -c 200 cp20.fa

-i:

# prepare a folder for a run
>cp20
AUUAUCAAGAAUCUCAAAGAGAGAUAGCAACCUGCAAUAACGAGCAAGGUGCUAAAAUAGAUAAGCCAAAUUCAAUUGGAAAAAAUGUUAA
.(((((....(((((.....)))))(((..(((((..[[[[..)).))).)))......))))).((((......)))).......]]]].

[peyote2] ~ rna_rosetta_run.py -i cp20.fa
run rosetta for:
cp20
AUUAUCAAGAAUCUCAAAGAGAGAUAGCAACCUGCAAUAACGAGCAAGGUGCUAAAAUAGAUAAGCCAAAUUCAAUUGGAAAAAAUGUUAA
.(((((....(((((.....)))))(((..(((((..[[[[..)).))).)))......))))).((((......)))).......]]]].
/home / magnus // cp20 / created
Seq & ss created

Troubleshooting.

If one of the helices is missing you will get:

IOError: [Errno 2] No such file or directory: 'helix1.out'
rosetta_submit.py README_FARFAR o 500 100 taf
Could not find:  README_FARFAR

and the problem was a1 and g8 pairing:

outputting command line to:  helix0.RUN  # previous helix #0
Sequence:  AUGG CCGG
Secstruc:  (((())))
Not complementary at positions a1 and g8!
Sequence:  GUGGG CCCAU
Secstruc:  ((((()))))

Writing to fasta file:  helix2.fasta  # next helix #2

My case with a modeling of rp12

Sequence: cc gc Secstruc: (()) Not complementary at positions 1 and 4!

edit the secondary structure, run the program with -i(init, to overwrite seq.fa, ss.fa) and then it works.

Notes:

rp17hc 6 characters
usage: rna_rosetta_run.py [-h] [-i] [-e] [-r] [-g] [-m MOTIF] [-n NSTRUC]
                          [-c CPUS] [--sandbox SANDBOX]
                          file
file

file: > a04 UAUAACAUAUAAUUUUGACAAUAUGGGUCAUAAGUUUCUACCGGAAUACCGUAAAUAUUCUGACUAUGUAUA ((((.((((…((.((((…….)))).))……..(.(((((…….))))).)..))))))))

-h, --help

show this help message and exit

-i, --init

prepare _folder with seq and ss

-e, --helices

produce h(E)lices

-r, --rosetta

prepare rosetta files (still you need go to send jobs to a cluster)

-g, --go

send jobs to a cluster(run qsubMINI)

-m <motif>, --motif <motif>

include a motif file, e.g. -s E-loop_1q9a_mutated_no_flanks_renumber.pdb

-n <nstruc>, --nstruc <nstruc>

# of structures you want to get

-c <cpus>, --cpus <cpus>

# of cpus to be used

--sandbox <sandbox>

where to run it (default: RNA_ROSETTA_RUN_ROOT_DIR_MODELING

rna_rosetta_run.py - prepare & run ROSETTA simulations

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http: // www.sciencedirect.com / science / article / pii / S0076687914000524

The script makes(1) a folder for you job, with seq.fa, ss.fa, input file is copied as input.fa to the folder(2) make helices(3) prepare rosetta input files(4) sends jobs to the cluster.

The header is take from the fast file(`` > /header / ``) not from the filename of your Fasta file.

I discovered this:

qstat -xml | tr '\n' ' ' | sed 's#<job_list[^>]*>#\n#g' \

> | sed ‘s#<[^>]*>##g’ | grep “ “ | column -t

(https://stackoverflow.com/questions/26104116/qstat-and-long-job-names) so there is now need to shorted my job ids.

Helix

Run:

rna_rosetta_run.py -i -e -r -g -c 200 cp20.fa

-i:

# prepare a folder for a run
>cp20
AUUAUCAAGAAUCUCAAAGAGAGAUAGCAACCUGCAAUAACGAGCAAGGUGCUAAAAUAGAUAAGCCAAAUUCAAUUGGAAAAAAUGUUAA
.(((((....(((((.....)))))(((..(((((..[[[[..)).))).)))......))))).((((......)))).......]]]].

[peyote2] ~ rna_rosetta_run.py -i cp20.fa
run rosetta for:
cp20
AUUAUCAAGAAUCUCAAAGAGAGAUAGCAACCUGCAAUAACGAGCAAGGUGCUAAAAUAGAUAAGCCAAAUUCAAUUGGAAAAAAUGUUAA
.(((((....(((((.....)))))(((..(((((..[[[[..)).))).)))......))))).((((......)))).......]]]].
/home / magnus // cp20 / created
Seq & ss created

Troubleshooting.

If one of the helices is missing you will get:

IOError: [Errno 2] No such file or directory: 'helix1.out'
rosetta_submit.py README_FARFAR o 500 100 taf
Could not find:  README_FARFAR

and the problem was a1 and g8 pairing:

outputting command line to:  helix0.RUN  # previous helix #0
Sequence:  AUGG CCGG
Secstruc:  (((())))
Not complementary at positions a1 and g8!
Sequence:  GUGGG CCCAU
Secstruc:  ((((()))))

Writing to fasta file:  helix2.fasta  # next helix #2

My case with a modeling of rp12

Sequence: cc gc Secstruc: (()) Not complementary at positions 1 and 4!

edit the secondary structure, run the program with -i(init, to overwrite seq.fa, ss.fa) and then it works.

Notes:

rp17hc 6 characters
class rna_tools.tools.rna_rosetta.rna_rosetta_run.CustomFormatter(prog, indent_increment=2, max_help_position=24, width=None)[source]
rna_tools.tools.rna_rosetta.rna_rosetta_run.get_parser()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_run.go()[source]

send jobs to a cluster(run qsubMINI)

rna_tools.tools.rna_rosetta.rna_rosetta_run.main()[source]

Pipeline for modeling RNA

rna_tools.tools.rna_rosetta.rna_rosetta_run.prepare_folder(args, header, seq, ss, path)[source]

Make folder for you job, with seq.fa, ss.fa, input file is copied as input.fa to the folder.

For ss lowercase is needed when used with motifs, otherwise:

[peyote2] aa20$ rna_rosetta_run.py -r -m E-loop_1q9a_mutated_no_flanks_renumber_for_acy20.pdb ~/aa20.fa
2019-05-03 21:31:30,842 rpt_config_local.py::<module>::rpt_config_local loading...
run rosetta for:
aa20
UACGUUCAUCAUCCGUUUGGAUGACGGAAGUAAGCGAAAGCUGAAGGAACGCAUG
..(((((.((((((....))))))..[.....(((....)))....)))))]...
rna_denovo_setup.py -fasta seq.fa -secstruct_file ss.fa -cycles 20000 -no_minimize -nstruct 50  -s E-loop_1q9a_mutated_no_flanks_renumber_for_acy20.pdb  -silent helix0.out helix1.out helix2.out -input_silent_res 3-7 47-51 9-14 19-24 33-35 40-42

Sequence:  UACGUUCAUCAUCCGUUUGGAUGACGGAAGUAAGCGAAAGCUGAAGGAACGCAUG
Secstruc:  ..(((((.((((((....))))))..[.....(((....)))....)))))]...
aaguagaag
AAGUAGAAG
Traceback (most recent call last):
  File "/home/magnus/opt/rosetta_src_2016.13.58602_bundle/tools/rna_tools/bin//rna_denovo_setup.py", line 170, in <module>
    raise ValueError('The sequence in %s does not match input sequence!!' % pdb)
ValueError: The sequence in E-loop_1q9a_mutated_no_flanks_renumber_for_acy20.pdb does not match input sequence!!
rosetta_submit.py README_FARFAR o 200 100 aa20_
Could not find:  README_FARFAR
rna_tools.tools.rna_rosetta.rna_rosetta_run.prepare_helices()[source]

Make helices(wrapper around ‘helix_preassemble_setup.py’)

Warning

I think multiprocessing of helixX.run does not work.

rna_tools.tools.rna_rosetta.rna_rosetta_run.prepare_rosetta(header, cpus, motif, nstruc)[source]

Prepare ROSETTA using rna_denovo_setup.py

cpus is used to calc nstruc per job to get 10k structures per full run:

:param nstruc: how many structures you want to obtain
:type nstruc: int
:param nstruct = int:
:type nstruct = int: math.floor(20000 / cpus)
:param motif: motif file; e.g., -s E-loop_1q9a_mutated_no_flanks_renumber.pdb
:param 50:
:type 50: nstruc) = 10k / 200 (cpus

Get a number of structures

rna_rosetta_n.py

rna_roseta_n.py - show me # of structure in a silent file

Example:

$ rna_rosetta_n.py ade.out
21594

$ rna_rosetta_n.py *out
default.out 100
default1.out 200
default2.out 200
default3.out 100
usage: rna_rosetta_n.py [-h] [-v] [-0] file [file ...]
file

ade.out

-h, --help

show this help message and exit

-v, --verbose
-0, --zero

rna_roseta_n.py - show me # of structure in a silent file

Example:

$ rna_rosetta_n.py ade.out
21594

$ rna_rosetta_n.py *out
default.out 100
default1.out 200
default2.out 200
default3.out 100
rna_tools.tools.rna_rosetta.rna_rosetta_n.get_no_structures(file)[source]
rna_tools.tools.rna_rosetta.rna_rosetta_n.get_parser()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_n.run(f)[source]

Pipline for modeling RNA.

Get a head of a Rosetta silent file

rna_rosetta_n.py

rna_roseta_head.py - get a head of a Rosetta silent file.

Example:

$ rna_rosetta_head.py -n 10000 thf.out
# a new file will be created, thf_10000.out

Silent file:

[peyote2] thf head -n 100 thf.out
SEQUENCE: ggagaguagaugauucgcguuaagugugugugaaugggaugucgucacacaacgaagcgagagcgcggugaaucauugcauccgcucca
SCORE:     score    rna_data_backbone    rna_vdw    rna_base_backbone    rna_backbone_backbone    rna_repulsive    rna_base_pair    rna_base_axis    rna_base_stagger    rna_base_stack    rna_base_stack_axis     rna_rg    atom_pair_constraint    linear_chainbreak       N_WC      N_NWC       N_BS description
REMARK BINARY SILENTFILE
SCORE:  -601.975                0.000     31.113              -16.960                   -3.888           20.501         -302.742          -38.531            -158.004           -80.764               -110.053     23.750                   0.000               33.601         32          6         86    S_000001_000
FOLD_TREE  EDGE 1 4 -1  JEDGE 4 85 1  C4   C2   END  EDGE 4 5 -1  EDGE 85 80 -1  EDGE 85 89 -1  JEDGE 80 40 5  C4   C2   END  EDGE 80 78 -1  EDGE 40 43 -1  EDGE 40 33 -1  JEDGE 33 45 4  C4   C2   END  EDGE 45 54 -1  EDGE 45 44 -1  EDGE 33 20 -1  JEDGE 20 65 3  C2   C4   END  EDGE 65 67 -1  EDGE 20 17 -1  EDGE 65 63 -1  JEDGE 17 69 2  C4   C2   END  JEDGE 63 58 6  C4   C2   END  EDGE 17 6 -1  EDGE 58 59 -1  EDGE 63 60 -1  EDGE 69 77 -1  EDGE 58 55 -1  EDGE 69 68 -1  S_000001_000
RT -0.987743 0.139354 0.0703103 0.135963 0.989404 -0.0509304 -0.0766626 -0.0407466 -0.996224 6.25631 0.103544 0.0647696   S_000001_000
RT -0.98312 0.1587 -0.091045 0.166923 0.981743 -0.0912024 0.074909 -0.10486 -0.991662 5.89962 -1.95819 -0.1075   S_000001_000
RT -0.987645 0.154078 0.0285994 0.153854 0.988044 -0.00989514 -0.0297821 -0.00537275 -0.999542 6.13138 1.047 0.115722   S_000001_000
RT -0.989884 0.140722 0.0180554 0.140036 0.989532 -0.0348618 -0.0227723 -0.0319807 -0.999229 6.21787 0.201588 -0.0264223   S_000001_000
RT -0.988455 0.134013 0.0706914 0.134924 0.990822 0.00825457 -0.0689364 0.0176973 -0.997464 6.19447 0.189237 0.125791   S_000001_000
RT -0.990412 0.138036 0.00546261 0.137927 0.990299 -0.0168644 -0.00773751 -0.0159492 -0.999843 6.25456 0.0842849 -0.0135807   S_000001_000
ANNOTATED_SEQUENCE: g[RGU:LowerRNA:Virtual_Phosphate]gaga[RAD:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]uagaugauucgcguuaagugugugugaaugggauguc[RCY:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]ucacacaacg[RGU:rna_cutpoint_lower]a[RAD:rna_cutpoint_upper]agcg[RGU:rna_cutpoint_lower]a[RAD:rna_cutpoint_upper]gagcgcg[RGU:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]ugaaucauu[URA:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]cauccgcucca[RAD:UpperRNA] S_000001_000
Le3nY9smsa4zEMGdvAA+z+e

It seems to work:

-rw-rw-r--   1 magnus users 474M 2017-08-06 05:25 thf_10000.out
-rw -rw-r--   1 magnus users 947M 2017-08-06 04:54 thf.out

[peyote2] thf rna_rosetta_n.py thf_10000.out
10000
usage: rna_rosetta_n.py [-h] [-v] [-n NSTRUC] file
file

ade.out

-h, --help

show this help message and exit

-v, --verbose
-n <nstruc>, --nstruc <nstruc>

rna_roseta_head.py - get a head of a Rosetta silent file.

Example:

$ rna_rosetta_head.py -n 10000 thf.out
# a new file will be created, thf_10000.out

Silent file:

[peyote2] thf head -n 100 thf.out
SEQUENCE: ggagaguagaugauucgcguuaagugugugugaaugggaugucgucacacaacgaagcgagagcgcggugaaucauugcauccgcucca
SCORE:     score    rna_data_backbone    rna_vdw    rna_base_backbone    rna_backbone_backbone    rna_repulsive    rna_base_pair    rna_base_axis    rna_base_stagger    rna_base_stack    rna_base_stack_axis     rna_rg    atom_pair_constraint    linear_chainbreak       N_WC      N_NWC       N_BS description
REMARK BINARY SILENTFILE
SCORE:  -601.975                0.000     31.113              -16.960                   -3.888           20.501         -302.742          -38.531            -158.004           -80.764               -110.053     23.750                   0.000               33.601         32          6         86    S_000001_000
FOLD_TREE  EDGE 1 4 -1  JEDGE 4 85 1  C4   C2   END  EDGE 4 5 -1  EDGE 85 80 -1  EDGE 85 89 -1  JEDGE 80 40 5  C4   C2   END  EDGE 80 78 -1  EDGE 40 43 -1  EDGE 40 33 -1  JEDGE 33 45 4  C4   C2   END  EDGE 45 54 -1  EDGE 45 44 -1  EDGE 33 20 -1  JEDGE 20 65 3  C2   C4   END  EDGE 65 67 -1  EDGE 20 17 -1  EDGE 65 63 -1  JEDGE 17 69 2  C4   C2   END  JEDGE 63 58 6  C4   C2   END  EDGE 17 6 -1  EDGE 58 59 -1  EDGE 63 60 -1  EDGE 69 77 -1  EDGE 58 55 -1  EDGE 69 68 -1  S_000001_000
RT -0.987743 0.139354 0.0703103 0.135963 0.989404 -0.0509304 -0.0766626 -0.0407466 -0.996224 6.25631 0.103544 0.0647696   S_000001_000
RT -0.98312 0.1587 -0.091045 0.166923 0.981743 -0.0912024 0.074909 -0.10486 -0.991662 5.89962 -1.95819 -0.1075   S_000001_000
RT -0.987645 0.154078 0.0285994 0.153854 0.988044 -0.00989514 -0.0297821 -0.00537275 -0.999542 6.13138 1.047 0.115722   S_000001_000
RT -0.989884 0.140722 0.0180554 0.140036 0.989532 -0.0348618 -0.0227723 -0.0319807 -0.999229 6.21787 0.201588 -0.0264223   S_000001_000
RT -0.988455 0.134013 0.0706914 0.134924 0.990822 0.00825457 -0.0689364 0.0176973 -0.997464 6.19447 0.189237 0.125791   S_000001_000
RT -0.990412 0.138036 0.00546261 0.137927 0.990299 -0.0168644 -0.00773751 -0.0159492 -0.999843 6.25456 0.0842849 -0.0135807   S_000001_000
ANNOTATED_SEQUENCE: g[RGU:LowerRNA:Virtual_Phosphate]gaga[RAD:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]uagaugauucgcguuaagugugugugaaugggauguc[RCY:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]ucacacaacg[RGU:rna_cutpoint_lower]a[RAD:rna_cutpoint_upper]agcg[RGU:rna_cutpoint_lower]a[RAD:rna_cutpoint_upper]gagcgcg[RGU:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]ugaaucauu[URA:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]cauccgcucca[RAD:UpperRNA] S_000001_000
Le3nY9smsa4zEMGdvAA+z+e

It seems to work:

-rw-rw-r--   1 magnus users 474M 2017-08-06 05:25 thf_10000.out
-rw -rw-r--   1 magnus users 947M 2017-08-06 04:54 thf.out

[peyote2] thf rna_rosetta_n.py thf_10000.out
10000
rna_tools.tools.rna_rosetta.rna_rosetta_head.get_parser()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_head.run()[source]

Pipline for modeling RNA.

Cluster

rna_rosetta_cluster.py

rna_rosetta_cluster.py - cluster silent Rosetta files

A wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

rna_rosetta_cluster.py ade_min.out 20000

Take n * 0.005 (.5%) of all frames and put them into selected.out. Then the tool clusters this selected.out.

usage: rna_rosetta_cluster.py [-h] [--no_select]
                              [--radius-inc-step RADIUS_INC_STEP]
                              [--limit-clusters LIMIT_CLUSTERS]
                              file n
file

ade.out

n

# of total structures

-h, --help

show this help message and exit

--no_select

Don’t run selection once again. Use selected.out in the current folder

--radius-inc-step <radius_inc_step>

radius incremental step, default 0.5

--limit-clusters <limit_clusters>

# of clusters

rna_rosetta_cluster.py - cluster silent Rosetta files

A wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

rna_rosetta_cluster.py ade_min.out 20000

Take n * 0.005 (.5%) of all frames and put them into selected.out. Then the tool clusters this selected.out.

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.cluster(radius, limit_clusters)[source]

Internal function of cluster_loop: It removes cluster.out first.

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.cluster_loop(ns, radius_inc_step, limit_clusters)[source]

Go from radius 1 to get 1/6 of structures of ns (# of selected structures) in the first cluster, then it stops.

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.extract()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_cluster.get_no_structures(file)[source]
rna_tools.tools.rna_rosetta.rna_rosetta_cluster.get_no_structures_in_first_cluster(fn)[source]

Get # of structures in a silent file.

Parameters:

fn (string) – a filename to a silent file

Returns

int: # of structures in a silent file

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.get_parser()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_cluster.get_selected(file, nc)[source]

Get selected for clustering

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.run()[source]

Pipline for modeling RNA

Minimize

rna_rosetta_min.py

rna_rosetta_min.py - a script to do minimization

The script takes the number of structures and the analyzed silence file and does the maths.

Job names will be as your silent file preceding with ~, .e.g ~tha.

http://www.sciencedirect.com/science/article/pii/S0076687914000524

ade$ rna_rosetta_cluster.py ade.out

The first number states how many processors to use for the run, while the second number is 1/6 the total number of previously generated FARNA models. If you are running on a supercomputer that only allows specific multiples of processors, use an appropriate number for the first input.:

rosetta_submit.py min_cmdline min_out 1 24

rosetta_submit.py min_cmdline min_out [1] [16] The first number states how many processors to use for each line in min_cmdline. Here, enter 1 for the first input so that the total number of processors used will be equal to the number of processors entered with the “-proc” flag in command line [12], above. The second number states the maximum time each job will be allowed to run (walltime). Start the run with the appropriate command listed by the out- put above (e.g., source qsubMPI for the Stampede cluster).

E.g. for 20k silet file, 1/6 will be minimized = 3.3k:

parallel_min_setup.py -silent rp21cr62.out -tag rp21cr62_min  -proc 200 -nstruct 3200 -out_folder mo -out_script MINIMIZE " -ignore_zero_occupancy false "
rosetta_submit.py MINIMIZE mo 1 100 m

[peyote2] rp21 easy_cat.py mo
Catting into:  rp21_min.out ... from 200 primary files. Found 3200  decoys.

# on 200 cpus it took around ~30min
usage: rna_rosetta_min.py [-h] [-g] [-c CPUS] file
file

ade.out

-h, --help

show this help message and exit

-g, --go
-c <cpus>, --cpus <cpus>

default: 200

rna_rosetta_min.py - a script to do minimization

The script takes the number of structures and the analyzed silence file and does the maths.

Job names will be as your silent file preceding with ~, .e.g ~tha.

http://www.sciencedirect.com/science/article/pii/S0076687914000524

ade$ rna_rosetta_cluster.py ade.out

The first number states how many processors to use for the run, while the second number is 1/6 the total number of previously generated FARNA models. If you are running on a supercomputer that only allows specific multiples of processors, use an appropriate number for the first input.:

rosetta_submit.py min_cmdline min_out 1 24

rosetta_submit.py min_cmdline min_out [1] [16] The first number states how many processors to use for each line in min_cmdline. Here, enter 1 for the first input so that the total number of processors used will be equal to the number of processors entered with the “-proc” flag in command line [12], above. The second number states the maximum time each job will be allowed to run (walltime). Start the run with the appropriate command listed by the out- put above (e.g., source qsubMPI for the Stampede cluster).

E.g. for 20k silet file, 1/6 will be minimized = 3.3k:

parallel_min_setup.py -silent rp21cr62.out -tag rp21cr62_min  -proc 200 -nstruct 3200 -out_folder mo -out_script MINIMIZE " -ignore_zero_occupancy false "
rosetta_submit.py MINIMIZE mo 1 100 m

[peyote2] rp21 easy_cat.py mo
Catting into:  rp21_min.out ... from 200 primary files. Found 3200  decoys.

# on 200 cpus it took around ~30min
rna_tools.tools.rna_rosetta.rna_rosetta_min.get_no_structures(file)[source]

Get a number of structures in a silent file

rna_tools.tools.rna_rosetta.rna_rosetta_min.get_parser()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_min.min(silent_file, take_n, cpus, go)[source]

Run parallel_min_setup (to MINIMIZE file), rosetta_submit.py, and qsubMINI.

Fix on the way, qsub files:

-out:file:silent mo/0/mo/123/tha_min.out -> -out:file:silent mo/123/tha_min.out

I don’t know why mo/0/ is there. I might be because of my changes in rosetta_submit.py (?).

rna_tools.tools.rna_rosetta.rna_rosetta_min.run()[source]

Extract lowscore decoy

rna_rosetta_extract_lowscore_decoys.py

rna_rosetta_extract_lowscore_decoys.py - a simple wrapper to extract_lowscore_decoys.py

To be used in Jupter notebooks and other scripts.

usage: rna_rosetta_extract_lowscore_decoys.py [-h] [-v] nstruc file
nstruc

# of low score structures to obtained

file

silent file

-h, --help

show this help message and exit

-v, --verbose

be verbose

rna_rosetta_extract_lowscore_decoys.py - a simple wrapper to extract_lowscore_decoys.py

To be used in Jupter notebooks and other scripts.

rna_tools.tools.rna_rosetta.rna_rosetta_extract_lowscore_decoys.get_parser()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_extract_lowscore_decoys.rosetta_extract_lowscore_decoys(file, nstruc, log=True)[source]

Check progress

rna_rosetta_cluster.py

rna_rosetta_check_progress.py - check progress for many simulations of Rosetta

Example:

[peyote2] rosetta_jobs rna_rosetta_check_progress.py .
         jobs  #curr  #todo  #decoys done
0  ./rp17s223    200      0      407  [ ]
1   ./rp17hcf      0      0        0  [ ]
# curr  232 #todo  0
usage: rna_rosetta_cluster.py [-h] [-v] [-m] [-s SELECT] [-k] dir
dir

directory with rosetta runs, define by RNA_ROSETTA_RUN_ROOT_DIR_MODELING right now:

-h, --help

show this help message and exit

-v, --verbose

be verbose

-m, --min-only

check only for mo folder

-s <select>, --select <select>

select for analysis only jobs with this phrase, .e.g., evoseq_

-k, --kill

kill (qdel) jobs if your reach limit (nstruc) of structure that you want, right now is 10000 structures

rna_rosetta_check_progress.py - check progress for many simulations of Rosetta

Example:

[peyote2] rosetta_jobs rna_rosetta_check_progress.py .
         jobs  #curr  #todo  #decoys done
0  ./rp17s223    200      0      407  [ ]
1   ./rp17hcf      0      0        0  [ ]
# curr  232 #todo  0
rna_tools.tools.rna_rosetta.rna_rosetta_check_progress.get_parser()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_check_progress.run_cmd(cmd)[source]

SimRNA

Select low energy frames

rna_simrna_lowest.py

rna_simrna_lowest.py - get lowest energy frames out of a SimRNA trajectory file

This code uses heavily the SimRNATrajectory class. Be default 100 lowest energy frames is exported.

usage: rna_simrna_lowest.py [-h] [-n NSTRUC] trafl
trafl

SimRNA trafl file

-h, --help

show this help message and exit

-n <nstruc>, --nstruc <nstruc>

SimRNA trafl file

Extract

rna_simrna_extract.py

rna_simrna_extra.py - extract full atom structures from a SimRNA trajectory file

Options:

SIMRNA_DATA_PATH has to be properly defined in rpt_config_local.

usage: rna_simrna_extract.py [-h] -t TEMPLATE -f TRAFL [-c]
                             [-n NUMBER_OF_STRUCTURES]
-h, --help

show this help message and exit

-t <template>, --template <template>

template PDB file used for reconstruction to full atom models

-f <trafl>, --trafl <trafl>

SimRNA trafl file

-c, --cleanup

Keep only *_AA.pdb files, move *.ss_detected and *.pdbto _<traj name folder>

-n <number_of_structures>, --number_of_structures <number_of_structures>

SimRNAweb

Download files of a SimRNAweb run

rna_simrnaweb_download_job.py

rna_simrnaweb_download_job.py - download model files, trajectory for a given SimRNAweb job.

Usage:

rp17pk$ rna_pdb_download_simrna_job.py 27b5093d -m -t -x
# download more clusters, trajectory, extract100

cp771_pk$ rna_pdb_download_simrna_job.py -t -x -m cf8f8bb2 -p cp771_pk
# download with a trajectory, and cluster #4 and #5, add to all pdb files
# prefix: cp771_pk

$ rna_simrnaweb_download_job.py --web-models rp17_well_d10_e1-a43d3ab5 --prefix tar
# prefix added will be tar_XXXX

Example:

rna_pdb_download_simrna_job.py -t -x -m 20569fa1 -p zmp_pk

[mm] zmp_pk ls
20569fa1_ALL_100low.trafl
_20569fa1-thrs7.10A_clust04
_20569fa1-thrs7.10A_clust05
_20569fa1_ALL_100low
data
rna_simrna_extract.log
subset.png
zmp_pk_20569fa1-thrs7.10A_clust01X.pdb
zmp_pk_20569fa1-thrs7.10A_clust02X.pdb
zmp_pk_20569fa1-thrs7.10A_clust03X.pdb
zmp_pk_20569fa1-thrs7.10A_clust04X.pdb
zmp_pk_20569fa1-thrs7.10A_clust05X.pdb
usage: rna_simrnaweb_download_job.py [-h] [-p PREFIX] [-n NSTRUC] [-e] [-m]
                                     [-r] [-c] [-d] [--top100] [--top200]
                                     [--web-models]
                                     job_id
job_id

job_id

-h, --help

show this help message and exit

-p <prefix>, --prefix <prefix>

prefix to the name, withouth _, be careful with this.If you have already some files with the given folder, their names mightbe changed.

-n <nstruc>, --nstruc <nstruc>

extract nstruc the lowest energy, this option must go with –web

-e, --extract

extract nstruc the lowest energy, this option must go with –web

-m, --more_clusters

download also cluster 4 and 5

-r, --remove-trajectory

remove trajectory after analysis

-c, --cluster

get trajectory from cluster OR local on your computer (mdfind for macOS)

-d, --download-trajectory

web

--top100

download top100 trajectory

--top200

download top200 trajectory

--web-models

web models download

SimRNATrajectory

SimRNATrajectory module.

SimRNATrajectory / Frame / Residue / Atom

class rna_tools.tools.simrna_trajectory.simrna_trajectory.Atom(name, x, y, z)[source]

x y z coord

get_coord()[source]

Return coords (np.array).

class rna_tools.tools.simrna_trajectory.simrna_trajectory.Frame(id, header, coords, top_level=False)[source]
Syntax of header:
  • write_number

  • replica_id

  • total_energy

  • energy_without_restraints

  • temperature

Warning

If there is an invalid frame, please use repair_trafl.py to fix the trajectory first.

rmsd_to(frame, verbose=False)[source]
class rna_tools.tools.simrna_trajectory.simrna_trajectory.Residue(id, p, c4p, n1n9, b1, b2)[source]

Create Residue object.

Each residue in SimRNA coarse-grained represantation consists only 5 coarse-grained atoms:

  • backbone: p = phospate group, c4p = sugar moiety

  • nucleotide: n1n9 = N1 for pyrimidines, N9 for purines, b1 = C2 for purines and pyrimidines, b2 = C4 for pyrimidines, C6 for purines

get_atoms()[source]

Return all atoms

get_center()[source]

Return MB for residue `((self.n1n9 + self.b2) / 2)`

class rna_tools.tools.simrna_trajectory.simrna_trajectory.SimRNATrajectory[source]
load_from_file(fn, debug_break=False, top_level=False, only_first_frame=False)[source]

Create a trajectory based on give filename.

Parameters:

top_level

top_level = True, don’t make a huge tree of objects (Residues/Atoms) == amazing speed up! Useful if you need only frames, energies and coords as text. You only get the info that is in header of each frame.

top_level = False, makes huge tree of objects (Residues/Atoms) == very slow for a huge trajectories

Warning

Loads up whole trafl file into memory, and get stuck. Use this if you want to compute e.g. distances between atoms, get the positions of specified atoms etc. If you can not process your trajectory

use top_level=True or look at load_from_string() to load a frame by frame from a file.

h(eader), l(line), f(ile)

load_from_list(frames)[source]
load_from_string(c, txt)[source]

Create a trajectory based on given string (txt) with id given by c.

Faster method, loads only one frame at a time to memory, and after computations loads the next frame (memory efficient).

plot_energy(plotfn='plot.png')[source]

Plots the SimRNA energy of the trajectory.

_images/simrnatrajectory.png
save(fn, verbose=True)[source]

Save the trajectory to file.

sort(inplace=True)[source]

Sort frames within the trajectory according to energy.

RNA Refinement (QRNAS)

rna_refinement.py

rna_refinement - RNA refinement with QRNAS.

Models of RNA 3D structures obtained by modeling methods often suffer from local inaccuracies such as clashes or physically improbable bond lengths, backbone conformations, or sugar puckers. To ensure high quality of models, a procedure of re nement should be applied as a nal step in the modeling pipeline. The software tool QRNAS was developed in our laboratory to perform local re nement of nucleic acid structures based on an extended version of the AMBER force field. The extensions consist of energy terms associated with introduction of explicit hydrogen bonds, idealization of base pair planarity and regularization of backbone conformation.

Read more: Piatkowski, P., Kasprzak, J. M., Kumar, D., Magnus, M., Chojnowski, G., & Bujnicki, J. M. (2016). RNA 3D Structure Modeling by Combination of Template-Based Method ModeRNA, Template-Free Folding with SimRNA, and Refinement with QRNAS. Methods in Molecular Biology (Clifton, N.J.), 1490(Suppl), 217-235. http://doi.org/10.1007/978-1-4939-6433-8_14

Right now, there is 20k steps of refinement.

_images/qrnas_0k.png

The initial structure, 179c48aa-c0d3-4bd6-8e06-12081da22998_ALL_thrs6.20A_clust01-000001_AA.pdb.

_images/qrnas_3k.png

after 3k, ~10min

_images/qrnas_10k.png

after 10k steps, around 30min

_images/qrnas_20k.png

after 20k steps, around 1h.

Installation of QRNAS

Download the QRNAS package from http://genesilico.pl/qrnas/, unzip the archive, and compile it with the following command:

./qrnamake sequential

This should create an executable version of QRNAS.

Warning

Please, change the name of the binary file from QRNA to QRNAS!

Be default the script searches QRNAS in <rna-tools>/opt/qrnas/ .

Usage of QRNA:

QRNA - Quick Refinement of Nucleic Acids (0.2 alpha)
     by Juliusz Stasiewicz (jstasiewicz@genesilico.pl)

To use type:
  QRNA -i <input PDBfile> [-o <output PDBfile>] [-c <configfile>] [-p] [-m <restraintsfile>]
OR specify <input PDBfile>, <output PDBfile> and <restraintsfile> in <configfile> and type just:
  QRNA -c <configfile>

Installation of this util

Set up in your bashrc:

export QRNAS_PATH=<your path to qrnas> # e.g. /home/magnus/src/rna-tools/opt/qrnas

but default rna-tools searches for qrnas in <rna-tools>/opt/qrnas.

QRNAS at Peyote2

There is no problem to run QRNAS at our Genesilico cluster, peyote2. Tested by mmagnus –170822. Copy files of QRNAS to peyote and run ./qrnamake sequential.

To run it at a cluster with the Sun Grid Engine queuing system (this one with qusb ;-)):

for p in *.pdb; do echo "rna_refinement.py $p >& ${p}.log" | qsub -cwd -V -pe mpi 1 -N "r_$p" ; done

DONE:

  • [x] clean up the output structure

  • [x] configuration should not be hardcoded

usage: rna_refinement.py [-h] [-s STEPS] [-o OUTPUT_FILE] [-i] [-v] fn
fn

input pdb file

-h, --help

show this help message and exit

-s <steps>, --steps <steps>

# of steps, default: 20k

-o <output_file>, --output_file <output_file>

output pdb file

-i, --interactive
-v, --verbose

RNA Molecular Dynammics (MD)

diffpdb

diffpdb - a simple tool to compare text-content of PDB files

The method is quick-and-dirty, but works!

The script takes first 31 characters of lines (or only atom names and residue names) starting with HETATM or ATOM and save these lines to a <filename>.out file.

One file is created per pdb. In the final step DIFF_TOOL is executed on these two output files. You get a diff output. That’s it! Enjoy!

Configuration:

  • DIFF_TOOL="open -a diffmerge" or DIFF_TOOL="kompare" to set up what tool would you like to use to diff files in the file rna-pdb-tools/tools/diffpdb/diffpdb_conf.py (create it if needed)

_images/screenshot.png

./diffpdb.py --names test_data/4/1duq.pdb test_data/4/1duq_decoy0171_amb_clx.pdb

_images/screenshot2.png

and on the Mac (using diffmerge):

_images/diffpdb_osx_diffmerge.png

One of the difference that can be detected with the script is variants of atoms.

_images/atom-variants.png

or a detection of missing atom.

_images/missing-atoms.png

or a detection of missing OP3 at the beginning.

_images/missing-op3.png

RNA clustering with CLANS (clanstix)

rna_clanstix - a tool for visualizing RNA 3D structures based on pairwise structural similarity with Clans.

We hacked Clans thus instead of BLAST-based distances between sequences, you can analyze distances between structures described as p-values of rmsd (based on the method from the Dokholyan lab.)

Quickref:

rna_clanstix.py --groups-auto 10 --color-by-homolog --shape-by-source  thf_ref_mapping_pk_refX.txt input2.clans

Running Clans: To run CLANS you need to have Java 1.4 or better installed (java can be downloaded HERE). For full functionality you will also need the NCBI BLAST,PSI-BLAST and formatdb executables (NCBI). For command line parameters and basic help please refer to the README file. (source: http://www.eb.tuebingen.mpg.de/research/departments/protein-evolution/software/clans.html)

_images/yndSrLTb7l.gif

The RMSDs between structures are converted into p-values based on the method from the Dokholyan lab or some hacky way developed by mmagnus .

Color groups

You can color your groups:

_images/rna_clanstix.png

To get colors, run a cmd like this:

rna_clastix.py rnapz17_matrix_farfar_HelSeedCst.txt --groups 20:seq1+20+20+20+20+20+20:seq10

where with the + sign you separate groups. Each group has to have a number of structures. Optionally it can have a name, e.g., 20:seq1, use : as a separator. If a provided name is native then this group will be shown as starts.

Get inspiration for more colors (http://www.rapidtables.com/web/color/RGB_Color.htm)

How to use ClanstixRNA?

  1. Get a matrix of distances, save it as e.g. matrix.txt (see Comment below)

  2. run ClanstixRNA on this matrix to get an input file to Clans (e.g. clans_rna.txt):

    rna_clanstix.py test_data/matrix.txt # clans.input will be created by default
    
  3. open CLANS and click File -> Load run and load clans_run.txt

  4. You’re done! :-)

Comment: To get this matrix you can use for example another tool from the rna-pdb-tools packages:

rna_calc_rmsd_all_vs_all.py -i rp18 -o rp18_rmsd.csv
rna_clastix.py --groups 1:native+5:3dRNA+
      5:Chen+3:Dokh+5:Feng+5:LeeASModel+
      5:Lee+5:RNAComposer+10:RW3D+5:Rhiju+
      1:YagoubAli+3:SimRNA  rp18_rmsd.csv clans.in

rna_clastix.py --groups 100+100+100+100+100+100+100+100+100+100+1:native  rp18_rmsd.csv

where rp18 is a folder with structure and rp18_rmsd.csv is a matrix of all-vs-all rmsds.

_images/rp18_clanstix.png

Hajdin, C. E., Ding, F., Dokholyan, N. V, & Weeks, K. M. (2010). On the significance of an RNA tertiary structure prediction. RNA (New York, N.Y.), 16(7), 1340–9. doi:10.1261/rna.1837410

An output of this tool can be viewed using CLANS.

Frickey, T., & Lupas, A. (2004). CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics (Oxford, England), 20(18), 3702–4. doi:10.1093/bioinformatics/bth444

class rna_tools.tools.clanstix.rna_clanstix.RNAStructClans(n=10, dotsize=10)[source]

Clans run.

Usage:

>>> f = open('matrix.txt')
>>> ids = f.readline().replace('#','').split()
>>> c = RNAStructClans(n=len(ids)) # 200?
>>> c.add_ids(ids)
>>> c.dist_from_matrix(f)
>>> print(c.txt)
add_ids(ids)[source]
dist_from_matrix(rmsds, matrix=0, use_pv=False, use_input_values=False, dont_calc=False, debug=False)[source]
dist_from_matrix_mp(output_pmatrix_fn, max, min, lines, pmat=False, use_pv=False, use_input_values=False, debug=False)[source]
rna_tools.tools.clanstix.rna_clanstix.check_symmetric(a, rtol=1e-05, atol=1e-08)[source]

https://stackoverflow.com/questions/42908334/checking-if-a-matrix-is-symmetric-in-numpy

rna_tools.tools.clanstix.rna_clanstix.get_parser()[source]

Misc

Plotting

rna_plot_hist.py

rna_plot_hist.py - generate a histogram

Don’t open Excel, Jupyter. Simple plot a histogram of one column and save it to a file.

Example:

# file
                                                  fn  rmsd_all
0  19_Bujnicki_Human_4_rpr_n0-000001.pdb-000001_A...     14.73
1  19_Bujnicki_Human_4_rpr_n0-000001.pdb.trafl_19...      0.46
2  19_Bujnicki_Human_4_rpr_n0-000001.pdb.trafl_19...     14.73
3  19_Bujnicki_Human_4_rpr_n0-000001.pdb_thrs0.50...      0.73
4  19_Bujnicki_Human_4_rpr_n0-000001.pdb_thrs0.50...      0.83

$ rna_plot_hist.py rmsds.csv --column rmsd_all
_images/rmsds_hist.png
usage: rna_plot_hist.py [-h] [--column COLUMN] [--sep SEP] [-o OUTPUT]
                        [--bins BINS]
                        file
file

rmsd.txt

-h, --help

show this help message and exit

--column <column>

column of file to plot

--sep <sep>

separator, be default

-o <output>, --output <output>
--bins <bins>

rna_plot_density.py

rna_plot_density.py - generate a density plot

Don’t open Excel, Jupyter. Simple plot a density of one column and save it to a file.

Example:

# file
                                                  fn  rmsd_all
0  19_Bujnicki_Human_4_rpr_n0-000001.pdb-000001_A...     14.73
1  19_Bujnicki_Human_4_rpr_n0-000001.pdb.trafl_19...      0.46
2  19_Bujnicki_Human_4_rpr_n0-000001.pdb.trafl_19...     14.73
3  19_Bujnicki_Human_4_rpr_n0-000001.pdb_thrs0.50...      0.73
4  19_Bujnicki_Human_4_rpr_n0-000001.pdb_thrs0.50...      0.83

$ rna_plot_hist.py rmsds.csv --column rmsd_all
_images/rmsds_dens.png
usage: rna_plot_density.py [-h] [--column COLUMN] [--sep SEP] [-o OUTPUT] file
file

rmsd.txt

-h, --help

show this help message and exit

--column <column>

column of file to plot

--sep <sep>

separator, be default

-o <output>, --output <output>

rna_sali2dotbracket

rna_sali2dotbracket

usage: rna_sali2dotbracket [-h] filename
filename

file in the Sali format

-h, --help

show this help message and exit

This beauty here will go to sali notation and convert it to dotbracket notation. The file name should be xxxx.sali

Author: Catarina Almeida

rna_tools.tools.rna_sali2dotbracket.rna_sali2dotbracket.convert_sali2dotbracket(fn)[source]

The function needs a filename in the Sali format. This function will get the secondary structure of the sequence, then get its identifier and then the sequence itself.

To get the ss

The line with the secondary structure is a list and will look like this:

['', '', '', '', '', '', '', '', '', '', '--...<<<[[...]]..>>>>', '', '', '\n']

In this case, the ss is in the 11th position. But in some files it may be in the 12th, 13th, 10th, etc..

If the longest element from the list is extracted, then this problem is overcomed.

The ss will some times have patterns of repeated gaps, which will come in the form of:

  1. x

  2. xnt

  3. ( x )

With x being any number, from 1 to 1000. These must be converted to the correspondent number of gaps (-) in the converted ss. This conversion is done by:

1 - Identifying the pattern with regex

2 - Replacing it with repl function.

As such, the following expressions will replace the previously mentioned patterns:

  1. re.sub(r'\d*\d', repl, temp)

  2. re.sub(r'\d*\dnt', repl, temp)

  3. re.sub(r'(?P<smthBeautiful>\(\d+\))', repl, temp)

To get the sequence

The sequence, much like the ss, can sometimes be in a different position in the list. Like in the ss, the longest element will be selected. Also, like in the ss, patterns for repeated gaps appear. So these must also be removed.

rna_tools.tools.rna_sali2dotbracket.rna_sali2dotbracket.get_parser()[source]
rna_tools.tools.rna_sali2dotbracket.rna_sali2dotbracket.repl(m)[source]

This function will replace the length of a given string by the correspondent number of dashes. The expression qwerty will be replaced by -----.

Cluster load

A very simple tool to see your cluster load per user:

MAX_JOBS: 1000
#jobs cluster 917 load:  0.917  to use: 83
#jobs you     749 load:  0.749  to use: 251
{'deepak': 160, 'azyla': 8, 'magnus': 749}
1 azyla        r 8
20 magnus       r 10
16 deepak       r 10
329 magnus       r 1
22 magnus       qw 10

A super simple script to get some statistics of who is running at a cluster

Set MAX_JOBS to calc % of usage, it’s an approximation of max number of jobs, e.g. peyote ~1k (rather 700, e.g. FARNA runs.).

rna_tools.tools.cluster_load.cluster_load.get_parser()[source]
rna_tools.tools.cluster_load.cluster_load.per_user()[source]

get stats (#cpus) per user

rna_tools.tools.cluster_load.cluster_load.stats_for_cluster()[source]

get stats (#jobs) per cluster

rna_tools.tools.cluster_load.cluster_load.stats_for_user()[source]

get stats (#jobs) per user

RNAkb