RNA Tools

RNA Sequence

RNA Sequence with secondary structure prediction methods.

This tool takes a given sequence and returns the secondary structure prediction provided by 5 different tools: RNAfold, RNAsubopt, ipknot, contextfold and centroid_fold. You must have these tools installed. You don’t have to install all tools if you want to use only one of the methods.

It’s easy to add more methods of your choince to this class.

Installation

Depends on what tools you want to use, follow the instructions below.

ViennaRNA

https://www.tbi.univie.ac.at/RNA/

For OSX install from the binary Installer from the page.

ipknot OSX

https://github.com/satoken/homebrew-rnatools

If one encounters a problem:

[mm] Desktop$ /usr/local/opt/bin/ipknot
dyld: Library not loaded: /usr/local/opt/glpk/lib/libglpk.40.dylib
  Referenced from: /usr/local/opt/bin/ipknot
  Reason: image not found
[1]    51654 abort      /usr/local/opt/bin/ipknot

the solution is:

brew install glpk # on OSX

RNA Structure

http://rna.urmc.rochester.edu/

Works with 5.8.1; Jun 16, 2016.

Download http://rna.urmc.rochester.edu/RNAstructureDownload.html and untar it in <RNA_PDB_TOOLS>/opt/RNAstructure/; run make, the tools will be compiled in a folder exe. Set up DATPATH in your bashrc to <RNA_PDB_TOOLS>/opt/RNAstructure/data_tables DATAPATH=/home/magnus/work/src/rna-pdb-tools/opt/RNAstructure/data_tables/ (read more http://rna.urmc.rochester.edu/Text/Thermodynamics.html). RNAstructure can be run with SHAPE restraints, read more http://rna.urmc.rochester.edu/Text/File_Formats.html#Constraint about the format. The file format for SHAPE reactivity comprises two columns. The first column is the nucleotide number, and the second is the reactivity. Nucleotides for which there is no SHAPE data can either be left out of the file, or the reactivity can be entered as less than -500. Columns are separated by any white space.

MC-Sym

FAQ

  • Does it work for more than one chain??? Hmm.. I think it’s not. You have to check on your own. –magnus

TIPS

Should you need to run it on a list of sequences, use the following script:

from rna_tools import Seq
f = open("listOfSequences.fasta")
for line in f:
 if line.startswith('>'):
   print line,
 else:
   print line,
   s = Seq.Seq(line.strip()) # module first Seq and class second Seq #without strip this has two lines
   print s.predict_ss(method="contextfold"),
   #print s.predict_ss(method="centroid_fold")

Todo

  • This calss should be renamed to RNASeq and merged with RNASeq class from RNAalignment
exception rna_tools.Seq.MethodNotChosen[source]
class rna_tools.Seq.RNASequence(seq, ss='', name='rna_seq')[source]

RNASequence.

Usage:

>>> seq = RNASequence("CCCCUUUUGGGG")
>>> seq.name = 'RNA03'
>>> print(seq.predict_ss("RNAfold", constraints="((((....))))"))
>RNA03
CCCCUUUUGGGG
((((....)))) ( -6.40)
eval(ss='', no_dangling_end_energies=False, verbose=False)[source]

Evaluate energy of RNA sequence.

Parameters:
  • ss (optional) –
  • no_dangling_end_energies (Boolean) –
  • verbose (Boolean) –
Returns:

Energy (float)

The RNAeval web server calculates the energy of a RNA sequence on a given secondary structure. You can use it to get a detailed thermodynamic description (loop free-energy decomposition) of your RNA structures.

Simply paste or upload your sequence below and click Proceed. To get more information on the meaning of the options click the help symbols. You can test the server using this sample sequence/structure pair.

An equivalent RNAeval command line call would have been:

RNAeval -v -d0 < input.txt

Read more: <http://rna.tbi.univie.ac.at//cgi-bin/RNAWebSuite/RNAeval.cgi>

get_foldability(ss='', verbose=False)[source]

Calculate foldability based on EntRNA.

Steps:

  • parse SS into basepairs,
  • calculate foldabilty

Configuration:

  • Set ENTRNA_PATH to the folder where ENTRNA_predict.py is.

Cmd wrapper in here:

python ENTRNA_predict.py --seq_file pseudoknotted_seq.txt --str_file pseudoknotted_str.txt

Su, C., Weir, J. D., Zhang, F., Yan, H., & Wu, T. (2019). ENTRNA: a framework to predict RNA foldability. BMC Bioinformatics, 20(1), 1–11. http://doi.org/10.1186/s12859-019-2948-5

predict_ss(method='RNAfold', constraints='', enforce_constraint=False, shapefn='', explore='', verbose=0)[source]

Predict secondary structure of the seq.

Parameters:
  • method
  • onstraints
  • shapefn (str) – path to a file with shape reactivites
  • verbose (boolean) –

It creates a seq fasta file and runs various methods for secondary structure prediction. You can provide also a constraints file for RNAfold and RNAsubopt.

Methods that can be used with contraints: RNAsubopt, RNAfold, mcfold.

Methods that can be used with SHAPE contraints: RNAfold.

ContextFold

Example:

$ java -cp bin contextFold.app.Predict in:CCCCUUUGGGGG
CCCCUUUGGGGG
((((....))))

It seems that a seq has to be longer than 9. Otherwise:

$ java -cp bin contextFold.app.Predict in:UUUUUUGGG
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10

# this is OK
$ java -cp bin contextFold.app.Predict in:CCCCUUUGGG
CCCCUUUGGG
.(((...)))

RNAstructure

Example:

>>> seq = RNASequence("GGGGUUUUCCC")
>>> print(seq.predict_ss("rnastructure"))
>  ENERGY = -4.4  rna_seq
GGGGUUUUCCC
((((...))))

and with the shape data:

>>> print(seq.predict_ss("rnastructure", shapefn="data/shape.txt"))
>  ENERGY = -0.2  rna_seq
GGGGUUUUCCC
.(((....)))

the shape data:

1 10
2 1
3 1

You can easily see that the first G is unpaired right now! The reactivity of this G was set to 10. Worked!

MC-Fold

MC-Fold uses the online version of the tool, this is very powerful with constraints:

rna_seq
acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -53.91
(-53.91, '((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))')

curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((........)))).......((((..............((((((((((..............))))))))))..))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -34.77
(-34.77, '((((........)))).......((((..............((((((((((..............))))))))))..))))')

acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((xxxxxxxx))))xxxxxxx((((xxxxxxxxxxxxxx((((((((((xxxxxxxxxxxxxx))))))))))xx))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -34.77
(-34.77, '((((........)))).......((((..............((((((((((..............))))))))))..))))')


acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((********))))*******((((**************((((((((((**************))))))))))**))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -77.30
(-71.12, '(((((((..))))))).......((((((.(((...)))..(((((((((((((((....)))))))))))))))))))))')

acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg
((((........)))).......((((..............(((((((((((((((....)))))))))))))))..))))
curl -Y 0 -y 300 -F "pass=lucy" -F mask="((((**[[[[[**))))*******((((****]]]]]****(((((((((((((((****)))))))))))))))**))))" -F sequence="acucggcuaggcgaguauaaauagccgucaggccuagcgcguccaagccuagccccuucuggggcugggcgaagggucggg" https://www.major.iric.ca/cgi-bin/MC-Fold/mcfold.static.cgi
mcfold::energy best dynamics programming: -77.30
('-77.30', '((((**[[[[[**))))*******((((****]]]]]****(((((((((((((((****)))))))))))))))**))))')

explore

The sub-optimal search space can be constrained within a percentage of the minimum free energy structure, as MC-fold makes use of the Waterman-Byers algorithm [18, 19]. Because the exploration has an exponential time complexity, increasing this value can have a dramatic effect on MC-Fold’s run time.

Parisien, M., & Major, F. (2009). RNA Modeling Using the MC-Fold and MC-Sym Pipeline [Manual] (pp. 1–84).

rna_tools.Seq.load_fasta_ss_into_RNAseqs(fn, debug=True)[source]

RNA Secondary Structure

Secondary structure analysis

exception rna_tools.SecondaryStructure.ExceptionOpenPairsProblem[source]
rna_tools.SecondaryStructure.draw_ss(title, seq, ss, img_out, resolution=4, verbose=False)[source]

Draw Secondary Structure using VARNA (you need correct configuration for this).

If everything is OK, return None, if an error (=exception) return stderr.

Usage:

>>> seq = 'GGAAACC'
>>> ss =  '((...))'
>>> img_out = 'output/demo.png'
>>> draw_ss('rna', seq, ss, img_out)
>>> print('Made %s' % img_out)
Made output/demo.png
_images/demo.png

Can be used with http://geekbook.readthedocs.io/en/latest/rna.html

rna_tools.SecondaryStructure.parse_vienna_to_pairs(ss, remove_gaps_in_ss=False)[source]

Parse Vienna (dot-bracket notation) to get pairs.

Parameters:
  • ss (str) – secondary stucture in Vienna (dot-bracket notation) notation
  • remove_gaps_in_ss (bool) – remove - from ss or not, design for DCA (tpp case ss = "(((((((((.((((.(((.....))))))......------)....." works with pk of the first level, [[]]
Returns:

(pairs, pairs_pk)

Return type:

list of two lists

Examples:

>>> parse_vienna_to_pairs('((..))')
([[1, 6], [2, 5]], [])

>>> parse_vienna_to_pairs('(([[))]]')
([[1, 6], [2, 5]], [[3, 8], [4, 7]])

>>> parse_vienna_to_pairs('((--))')
([[1, 6], [2, 5]], [])

>>> parse_vienna_to_pairs('((--))', remove_gaps_in_ss=True)
([[1, 4], [2, 3]], [])

>>> parse_vienna_to_pairs('((((......')
Traceback (most recent call last):
  File "/usr/lib/python2.7/doctest.py", line 1315, in __run
    compileflags, 1) in test.globs
  File "<doctest __main__.parse_vienna_to_pairs[4]>", line 1, in <module>
    parse_vienna_to_pairs('((((......')
  File "./SecondaryStructure.py", line 106, in parse_vienna_to_pairs
    raise ExceptionOpenPairsProblem('Too many open pairs (()) in structure')
ExceptionOpenPairsProblem: Too many open pairs (()) in structure

rna_dot2ct.py

The output file is <input-file>.ct

Wrapper to

RNAstructure: software for RNA secondary structure prediction and analysis. (2010). RNAstructure: software for RNA secondary structure prediction and analysis., 11, 129. http://doi.org/10.1186/1471-2105-11-129

usage: rna_dot2ct.py [-h] [-v] file
Positional arguments:
file Input is: >seq aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa ((…((((((((((…….)))))))))).))
Options:
-v=False, --verbose=False
 be verbose

Secondary structure format conversion

rna_convert_pseudoknot_formats

Run this as:

python rna-pk-simrna-to-one-line.py test_data/simrna.ss

Convert:

> a
....((.(..(((....)))..((((.(.........).)))....).)).......((((......))))
..............................((((...................))))..............

to:

> a
....((.(..(((....)))..((((.(..[[[[...).)))....).))...]]]]((((......))))

and:

>2 chains
(((((......)))))........(.((....(.......)..(((. .)))...)).)
.....((((((......................))))))........ ...........

to:

>2 chains
((((([[[[[[)))))........(.((....(]]]]]].)..(((. .)))...)).)

and:

> b
..(.......(((....)))..((((.(.........).))))).............((((......))))
....((.(......................................).)).....................
..............................((((...................))))..............

to:

> b
..(.[[.[..(((....)))..((((.(..{{{{...).)))))..].]]...}}}}((((......))))

and it works with VARNA:

_images/varna_2pk.png

Convert a secondary structure with a pk to the SimRNA format:

rna_convert_pseudoknot_formats git:(master) ✗ python rna_ss_pk_to_simrna.py test_data/ss_with_pk.ss
((((([[[[[[)))))........(.((....(]]]]]].)..(((. .)))...)).)

(((((......)))))........(.((....(.......)..(((. .)))...)).)
.....((((((......................))))))........ ...........
rna_tools.tools.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.get_multiple_lines(ss)[source]
rna_tools.tools.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.get_parser()[source]
rna_tools.tools.rna_convert_pseudoknot_formats.rna_ss_pk_to_simrna.is_pk(ss)[source]

RNA Alignment

RNAalignment - a module to work with RNA sequence alignments.

To see a full demo what you can do with this util, please take a look at the jupiter notebook (https://github.com/mmagnus/rna-pdb-tools/blob/master/rna_tools/tools/rna_alignment/rna_alignment.ipynb)

Load an alignment in the Stockholm:

   alignment = ra.RNAalignment('test_data/RF00167.stockholm.sto')

or fasta format::

    import rna_alignment as ra
    alignment = ra.fasta2stokholm(alignment.fasta)
    alignment = ra.RNAalignment

Parameters of the aligmnent:

print(alignment.describe())

Consensus SS:

print(alignment.ss_cons_with_pk)

Get sequnce/s from teh aligment:

>>> seq = a.io[0]

RNASeq

class rna_tools.tools.rna_alignment.rna_alignment.RNASeq(id, seq, ss=None)[source]

RNASeq.

Parameters:
  • id (str) – id of a sequence
  • seq (str) – seq, it be uppercased.
  • ss (str) – secondary structure, default None
seq_no_gaps

str – seq.replace(‘-‘, ‘’)

ss_no_gaps

str – ss.replace(‘-‘, ‘’)

draw_ss(title='', verbose=False, resolution=1.5)[source]

Draw secondary structure of RNA with VARNA.

VARNA: Visualization Applet for RNA A Java lightweight component and applet for drawing the RNA secondary structure

_images/varna.png

Cite: VARNA: Interactive drawing and editing of the RNA secondary structure Kevin Darty, Alain Denise and Yann Ponty Bioinformatics, pp. 1974-197,, Vol. 25, no. 15, 2009

http://varna.lri.fr/

get_conserved(consensus, start=0, to_pymol=True, offset=0)[source]

Start UCGGGGUGCCCUUCUGCGUG————————————————–AAGGC-UGAGAAAUACCCGU————————————————-AUCACCUG-AUCUGGAU-AAUGC XXXXXXXXXXXXGXGXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX—————————-XXXXX-XCUGAGAXXXXXXXXXXXXXXXXXXXXXX———————————-XXXXXXXX-XXXXXXXX-ACXUG

get_distance_to(nseq)[source]

Get distance of self.seq to nseq.

get_ss_std()[source]
remove_columns(to_remove)[source]

indexing from 0

remove_gaps(check_bps=True, only_canonical=True, allow_gu=True)[source]

Remove gaps from seq and secondary structure of the seq.

Parameters:
  • check_bps (bool) – fix mistakes as
  • only_canonical (bool) – keep in ss only pairs GC, AU
  • allow_gu (bool) – keep in ss also GU pair
_images/ss_misgap.png

A residue “paired” with a gap.

_images/ss_misgap_wrong.png

paired with any residues (in the blue circle). If yes, then this residues is unpair (in this case ) -> .).

_images/ss_misgap_ok.png

if only_canonical (by default) is True then only GC, AU can be paired.

_images/ss_only_canonical.png

If allow_gu is False (be default is True) then GU pair is also possible.

_images/ss_no_gu.png

If you provide seq and secondary structure such as:

GgCcGGggG.GcggG.cc.u.aAUACAAuACCC.GaAA.GGGGAAUAaggCc.gGCc.gu......CU.......uugugcgGUuUUcaAgCccCCgGcCaCCcuuuu
(((((((((....((.((............(((......)))......))))..(((.(.....................)))).......)))))))))........

gaps will be remove as well.

ss_to_bps()[source]

Convert secondary structure into a list of basepairs.

Returns:a list of base pairs, e.g. [[0, 80], [1, 79], [2, 78], [4, 77], [6, 75], [7, 74], …]
Return type:bps (list)

RNAalignment

class rna_tools.tools.rna_alignment.rna_alignment.RNAalignment(fn='', fetch='')[source]

RNA alignment - adapter class around BioPython to do RNA alignment stuff

Usage (for more see IPython notebook https://github.com/mmagnus/rna-tools/blob/master/rna_tools/tools/rna_alignment/rna_alignment.ipynb)

>>> a = RNAalignment('test_data/RF00167.stockholm.sto')
>>> print(a.tail())
>>> print(a.ss_cons)
Parameters:
  • fn (str) – Filename
  • io (Bio.AlignIO) – AlignIO.read(fn, “stockholm”)
  • lines (list) – List of all lines of fn
  • seqs (list) – List of all sequences as class:RNASeq objects
  • rf (str) – ReFerence annotation, the consensus RNA sequence

Read more:

and on the format itself

Warning

fetch requires urllib3

align_seq(seq)[source]

Align seq to the alignment.

Using self.rf.

Parameters:seq (str) – sequence, e.g. -GGAGAGUA-GAUGAUUCGCGUUAAGUGUGUGUGA-AUGGGAUGUC...
Returns:seq that can be inserted into alignemnt, .-.GG.AGAGUA-GAUGAUUCGCGUUA ! . -> -
Return type:str
copy_ss_cons_to_all(verbose=False)[source]
copy_ss_cons_to_all_editing_sequence(seq_id, before, after)[source]

Change a sequence’s sec structure.

Parameters:
  • seq_id – string, sequence id to change, eg: AE009948.1/1094322-1094400
  • before – string, character to change from, eg: ,
  • after – string, character to change to, eg: .

Warning

before and after has to be one character long

describe()[source]

Describe the alignment.

> print(a.describe()) SingleLetterAlphabet() alignment with 13 rows and 82 columns

find_core(ids=None)[source]

Find common core for ids.

_images/find_core.png

Fig. By core, we understand columns that have all homologous residues. The core is here marked by x.

Parameters:id – list, ids of seq in the alignment to use
find_seq(seq, verbose=False)[source]

Find seq (also subsequences) and reverse in the alignment.

Parameters:
  • seq (str) – seq is upper()
  • verbose (bool) – be verbose
seq = "ggaucgcugaacccgaaaggggcgggggacccagaaauggggcgaaucucuuccgaaaggaagaguaggguuacuccuucgacccgagcccgucagcuaaccucgcaagcguccgaaggagaauc"
hit = a.find_seq(seq, verbose=False)
ggaucgcugaacccgaaaggggcgggggacccagaaauggggcgaaucucuuccgaaaggaagaguaggguuacuccuucgacccgagcccgucagcuaaccucgcaagcguccgaaggagaauc
Match: AL939120.1/174742-174619
ID: AL939120.1/174742-174619
Name: AL939120.1
Description: AL939120.1/174742-174619
Number of features: 0
/start=174742
/end=174619
/accession=AL939120.1
Per letter annotation for: secondary_structure
Seq('CCAGGUAAGUCGCC-G-C--ACCG---------------GUCA-----------...GGA', SingleLetterAlphabet())
GGAUCGCUGAACCCGAAAGGGGCGGGGGACCCAGAAAUGGGGCGAAUCUCUUCCGAAAGGAAGAGUAGGGUUACUCCUUCGACCCGAGCCCGUCAGCUAACCUCGCAAGCGUCCGAAGGAGAAUC
find_seq_exact(seq, verbose=False)[source]

Find seq (also subsequences) and reverse in the alignment.

Parameters:
  • seq – string, seq, seq is upper()
  • verbose – boolean, be verbose or not
format_annotation(t)[source]
get_clean_ss(ss)[source]
get_distances()[source]

Get distances (seq identity) all-vs-all.

With BioPython.

blastn: Bad alphabet 'U' in sequence 'AE008922.1/409481-409568' at position '7' only for DNA?

read more (also about matrix at <http://biopython.org/wiki/Phylo> and HTTP://biopython.org/DIST/docs/api/Bio.Phylo.TreeConstruction.DistanceCalculator-class.html

get_gc_rf()[source]

Return (str) #=GC RF or ‘’ if this line is not in the alignment.

get_seq(seq_id)[source]
get_seq_ss(seq_id)[source]
get_seq_with_name(seq_name)[source]
get_shift_seq_in_align()[source]

RF_cons vs ‘#=GC RF’ ???

get_ss_cons()[source]
Returns:SS_cons_pk line or None if there is now SS_cons_pk.
get_ss_cons_pk()[source]
Returns:SS_cons_pk line or None if there is now SS_cons_pk:
get_ss_remove_gaps(seq, ss)[source]
Parameters:
  • seq – string, sequence
  • ss – string, ss

UAU-AACAUAUAAUUUUGACAAUAUGG-GUCAUAA-GUUUCUACCGGAAUACC–GUAAAUAUUCU—GACUAUG-UAUA- (((.(.((((,,,(((((((_______.))))))).,,,,,,,,(((((((__.._____))))))…),,)))).)))).

get_the_closest_seq_to_ref_seq(verbose=False)[source]

Example:

>>> a = RNAalignment("test_data/RF02221.stockholm.sto")
>>> a.get_the_closest_seq_to_ref_seq()
AF421314.1/431-344
head()[source]
map_seq_on_align(seq_id, resis, v=True)[source]
Parameters:
  • seqid – seq_id, ‘CP000721.1/2204691-2204775’
  • resis – list resis, [5,6]

maps:

[5, 6, 8]
CAC-U
CAC-U-
CAC-U-UA
[4, None, 6]
map_seq_on_seq(seq_id, seq_id_target, resis, v=True)[source]
Parameters:
  • seq_id – seq_id, ‘AAML04000013.1/228868-228953’
  • seq_id_target – seq_id of target, ‘CP000721.1/2204691-2204778’
  • resis – list resis, [5,6]

map:

[4, 5, 6]
UAU-A
UAU-AA
UAU-AAC
[5, 6, 7]
CAC-U
CAC-U-
CAC-U-U
[4, None, 5]
plot(plot_fn='rchie.png')[source]
reload_alignment()[source]
remove_empty_columns(verbose=False)[source]

Remove empty columns in place.

Example:

>>> a = RNAalignment("test_data/zmp.stk")
>>> print(a)
SingleLetterAlphabet() alignment with 6 rows and 319 columns
---ACCUUGCGCGACUGGCGAAUCC-------------------...AAU CP001644.1/756294-756165
--GCUCUCGCGCGACUGGCGACUUUG------------------...GAA CU234118.1/352539-352459
UGAGUUUUCUGCGACUGACGGAUUAU------------------...CUG BAAV01000055.1/2897-2982
GCCCGUUCGCGUGACUGGCGCUAGU-------------------...CGA CP000927.1/5164264-5164343
-----GGGUCGUGACUGGCGAACA--------------------...--- zmp
UCACCCCUGCGUGACUGGCGAUA---------------------...GUU AP009385.1/718103-718202
>>> a.remove_empty_columns()
>>> print(a)
SingleLetterAlphabet() alignment with 6 rows and 138 columns
---ACCUUGCGCGACUGGCGAAUCC-UGAAGCUGCUUUG-AGCG...AAU CP001644.1/756294-756165
--GCUCUCGCGCGACUGGCGACUUUG------------------...GAA CU234118.1/352539-352459
UGAGUUUUCUGCGACUGACGGAUUAU------------------...CUG BAAV01000055.1/2897-2982
GCCCGUUCGCGUGACUGGCGCUAGU-------------------...CGA CP000927.1/5164264-5164343
-----GGGUCGUGACUGGCGAACA--------G-----------...--- zmp
UCACCCCUGCGUGACUGGCGAUA--------GAACCCUCGGGUU...GUU AP009385.1/718103-718202

go over all seq modifes self.nss_cons

ss_cons_std
ss_cons_with_pk

go over ss_cons and overwrite bp is there is pk (ss_cons_pk)

ss_cons: (((.(.((((,,,(((((((_______.))))))).,,,,,,,,(((((((__.._____))))))…),,)))).)))). ss_cons_pk: …………………..[[………………………….]]…………………… ss_cons_with_pk: (((.(.((((,,,(((((((___[[__.))))))).,,,,,,,,(((((((__.._]]__))))))…),,)))).)))).

“return ss_cons_with_pk: string, e.g. (((.(.((((,,,(((((((___[[__.))))

ss_cons_with_pk_std
subset(ids, verbose=False)[source]

Get subset for ids:

# STOCKHOLM 1.0
#=GF WK Tetrahydrofolate_riboswitch

AAQK01002704.1/947-1059 -U-GC-AAAAUAGGUUUCCAUGC.. #=GC SS_cons .(.((.((—-((((((((((… #=GC RF .g.gc.aGAGUAGggugccgugc.. //

tail()[source]
trimmed_rf_and_ss()[source]

Remove from RF and SS gaps.

Returns:trf, tss - new RF and SS
Return type:(str,str)
write(fn, verbose=False)[source]

Write the alignment to a file

rna_alignment_get_species.py

The output you simply get from the screen, save it it to a file.

Example:

rna_alignment_get_species.py RF00004.stockholm.stk
# STOCKHOLM 1.0
Sorex-araneus-(European-shrew)                     AUCGCU-UCU----CGGCC--UUU-U

Examples 2:

[dhcp177-lan203] Desktop$ rna_alignment_get_species.py u5_rfam_u5only.stk --verbose
# STOCKHOLM 1.0
#=GF WK U5_spliceosomal_RNA
#=GF NC 39.90
#=GF RT The spliceosomal snRNAs of Caenorhabditis elegans.
#=GF TC 40.00
#=GF RN [3]
#=GF RM 2339054
#=GF RL Nucleic Acids Res 1990;18:2633-2642.
#=GF AU Gardner PP; 0000-0002-7808-1213
#=GF CC methylation.
#=GF CB cmcalibrate --mpi CM
#=GF DR GO; 0046540; U4/U6 x U5 tri-snRNP complex;
#=GF ID U5
#=GF SS Published; PMID:2339054; Griffiths-Jones SR
#=GF RA Thomas J, Lea K, Zucker-Aprison E, Blumenthal T
#=GF SQ 180
#=GF SM cmsearch --cpu 4 --verbose --nohmmonly -E 1000 -Z 549862.597050 CM SEQDB
#=GF DE U5 spliceosomal RNA
#=GF AC RF00020
#=GF SE Zwieb C, The uRNA database, PMID:9016512; PMID:18390578
#=GF GA 40.00
#=GF BM cmbuild -F CM SEED
#=GF TP Gene; snRNA; splicing;
Bos-taurus-(cattle)                                GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Sorex-araneus-(European-shrew)                     GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Ictidomys-tridecemlineatus-(thirteen-lined-ground- GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Monodelphis-domestica-(gray-short-tailed-opossum)  GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Oryctolagus-cuniculus-(rabbit)                     GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Cavia-porcellus-(domestic-guinea-pig)              GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G
Ochotona-princeps-(American-pika)                  GAUC-GUAUAAAUCUUUCGCCUUUUACUAAAGA-UUUCCG----UGG-A--GA-G

Note

This code has way more code than the name of the script says. This is customized script based on some script that did way more.

usage: rna_alignment_get_species [-h] [-v] [--debug] [--id-width ID_WIDTH]
                                 [--evo-mapping EVO_MAPPING]
                                 [--evo-mapping-default] [--one] [--u5]
                                 [--calc-energy] [--osfn OSFN]
                                 alignment
Positional arguments:
alignment Undocumented
Options:
-v=False, --verbose=False
 be verbose
--debug=False Undocumented
--id-width=50 Undocumented
--evo-mapping Undocumented
--evo-mapping-default=False
 Undocumented
--one=False Undocumented
--u5=False Undocumented
--calc-energy=False
 Undocumented
--osfn Undocumented

rna_alignment_calc_energy.py

Calculate energy (.cet) format:

UGGC-CCCUGCGCAA-GGAUGACA
(((..((((.....).))..))))
(((..(((((***)).))..))))

Examples:

 $ rna_alignment_calc_energy.py --template alignments/u6-lower.cet alignments/u6-only-RemovedGapped.stk -v
       --loop-upper guaa --loop-lower guaa
       --loop-upper-cst '(..)' --loop-lower-cst '(..)'
 calc-energy2.py --template u6atac-template.txt u6atac_u6only.sto -v
./calc-energy2.py --template alignments/u6-lower.cet --one alignments/u6-lower-stem-only.sto

Takes cet files (calc-energy-templets).

$ rna_alignment_calc_energy.py –template test_data/u6-lower.cet –one test_data/u6-only.stk -v # –loop-seq test_data/u6-only-loop-seq-u6-lower N/A% (0 of 182) | | Elapsed Time: 0:00:00 ETA: –:–:–================================================================================ AB010698.1/46467-46488 (((..((((…..).))..)))) UGGC-CCCUGCGCAA-GGAUGACA lower ———————————— UGG ugcgca ACA (((**))) UGGugcgcaACA (((((..))))) -10.64 upper ———————————— UGGC-CCCUGCGCAA-GGAUGACA CCC ugcgca AGG CCCugcgcaAGG (((((..))))) -9.6

id low_energy low_seq low_ss up_energy up_seq up_ss

0 AB010698.1/46467-46488 -10.64 UGGugcgcaACA (((((..))))) -9.6 CCCugcgcaAGG (((((..))))) Done: u6-only-loop-seq-u6-lower

Is

domains have 5451 elements.
10:47:16 up 141 days, 26 min, 0 users, load average: 1.45, 1.30, 1.56

Score: -999.000 GAACAUGGUUCUUGCCUUUUACCAGAACCAUCCGGGUGUUG Total number of MB structures with 3 stems: 16041 (overlaps: 0, !energy: 335585) </pre><P><H2>Sorting the structures… <P></H2><pre></pre><H2><P><P><P>Filtered and Sorted solutions:<P><P><P></H2><pre> </pre><H2><P><P><P><a HREF=”http://biwww2.informatik.uni-freiburg.de/Software/MARNA/index.html” target=”_blank”>MARNA</a>-formatted:<P><P><P></H2><pre> GAACAUGGUUCUUGCCUUUUACCAGAACCAUCCGGGUGUUG ((((((((((..))))))))))((((((((…)))))))) -33.20 ( -0.69) (((((((((….)))))))))((((((((…)))))))) -33.17 ( -0.69) ((((((((((((((((…))))))))))))……)))) -32.40 ( +0.00)

Backtracking with 2 variables (stems): domains have 5451 elements.

10:47:16 up 141 days, 26 min, 0 users, load average: 1.45, 1.30, 1.56

Score: -999.000 GAACAUGGUUCUUGCCUUUUACCAGAACCAUCCGGGUGUUG Total number of MB structures with 2 stems: 9555 (overlaps: 0, !energy: 165582) </pre><P><H2>Sorting the structures… <P></H2><pre></pre><H2><P><P><P>Filtered and Sorted solutions:<P><P><P></H2><pre> </pre><H2><P><P><P><a HREF=”http://biwww2.informatik.uni-freiburg.de/Software/MARNA/index.html” target=”_blank”>MARNA</a>-formatted:<P><P><P></H2><pre> GAACAUGGUUCUUGCCUUUUACCAGAACCAUCCGGGUGUUG ((((((((((..))))))))))((((((((…)))))))) -33.20 ( -0.69) (((((((((….)))))))))((((((((…)))))))) -33.17 ( -0.69) ((((((((((((((((…))))))))))))……)))) -32.40 ( +0.00)

usage: rna_alignment_calc_energy [-h] [--debug] [--one] [--method METHOD]
                                 [--csv CSV] [--loop-seq]
                                 [--template TEMPLATE] [--flanks FLANKS] [-v]
                                 alignment
Positional arguments:
alignment an alignment in the Stockholm format
Options:
--debug=False Undocumented
--one=False one only for the first seq
--method=mcfold
 mcfold or rnastructure_CycleFold
--csv Undocumented
--loop-seq=False
 Undocumented
--template Undocumented
--flanks GC be default
-v=False, --verbose=False
 Undocumented

rna_align_get_ss_from_fasta.py

Input as a file:

>ade
GCU-U-CAUAUAAUCCUAAUGAUAUGG-UUUGGGA-GUUUCUACCAAGAG-CC--UUAAA-CUCUU---GAUUAUG-AAGU-
(((.(.((((,,,(((((((_______.))))))).,,,,,,,,(((((((__.._____))))))...),,)))).)))).

to get:

>ade
GCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGU
((((((((...(((((((.......)))))))........((((((.......))))))..))))))))

usage: rna_align_get_ss_from_fasta [-h] file
Positional arguments:
file subsection of an alignment

rna_align_get_ss_from_stk.py

Process an alignment in the Stockholm format to get sequences and secondary structures:

Example:

$ rna_align_get_ss_from_alignment.py aligns/gmp_RF01786.stockholm_p75_pk.sto
AAOX01000007.1/31274-31356
AAGAAUAUAGAACACUGUGAUGAGCGGUUUUUAUUUGCACUUUAAACCGCUUGGAGUGACUAGUGCAGCCGGCCAAUGAUCUA
.(((.(((.(..(((......((((((((...............))))))))...)))...............).))).))).
CP000724.1/3560727-3560809
AAAAAUGUAGAGCAAAUGAACUGCAGGUAUACAUGGACGCCUUAAACUGCAGGGAUGUAGUGGCGUAACCGACUAACAAUAUU
((.(.(((((.(((......((((((.(......[[...[....).))))))...)))...]...]..]...))).)).).))
AACY023761929.1/1009-1091
AUAAUUUGGUGGGCGUUGAUGUGCCCUUUGUAUCUGGUCGCUUGAGGGGUACGGAGCCAAUAGCGAAACCGCCGCCGUCAUAG
.((...((((((((......((((((((.......[.[......))))))))...)))......]...]...)))))...)).

usage: rna_align_get_ss_from_stk [-h] file
Positional arguments:
file subsection of an alignment

rna_align_distance_to_seq.py

Calculate

”Process an alignment in the Stockholm format to get sequences and secondary structures:

Example:

$ rna_align_distance_to_seq.py test_data/gmp_ref.sto test_data/gmp_ref_distance.csv

   distance                            id
0      1.00                           gmp
1      0.69    AE000513.1/1919839-1919923
2      0.73      BA000004.3/387918-388001
3      0.69  ABFD02000011.1/154500-154585
4      0.73      AE015927.1/474745-474827
5      0.75                AAWL01000006.1
6      0.72                    AM180355.1
7      0.72      CP001116.1/102374-102457
8      0.65    AJ965256.1/1260708-1260792

                                                 seq
0  -----GCGCGGAAAC-AAUGAUGAAU--GGG-UUUA-AAUUGGGC-...
1  CUGUCGAAGAGACGC-GAUGAAUCCC--GCC-CUGUAAUUCGGGC-...
2  AAUCAAUAGGGAAGC-AACGAAGCAU--AGC-CUUU-AUAUGGAC-...
3  AAAUAUUAUAGAGAU-GUUGAAGUAU--AUU-CUAUUA-UUGGGC-...
4  AUUUUAAGAGGAAAU-UUUGAACUAU--AUA-CUU--AUUUGGGC-...
5  --UGCAA-UGGGUGU-GAUGAAGUCC--GGA-CAGUAAUGUGGGC-...
6  AAUAUUU-UAGAAAC-UGAGAAGUAU--AUC-UUAUUA-UUGGGC-...
7  AUAACGGCACGAAGC-AAUGAAAUGU--UCG-AUGU-AACCGGGC-...
8  AAAUUAAGGGGAAGC-GUUGAGCCGC--UAC-CCAU-AUGUGGUUC...

                                                 ss:
0  (((((((((((.(((.......((((..(((.(................
1  (((((((((((.(((.......((((..(((.(................
2  (((((((((((.(((.......((((..(((.(................
3  (((((((((((.(((.......((((..(((.(................
4  (((((((((((.(((.......((((..(((.(................
5  (((((((((((.(((.......((((..(((.(................
6  (((((((((((.(((.......((((..(((.(................
7  (((((((((((.(((.......((((..(((.(................
8  (((((((((((.(((.......((((..(((.(................

usage: rna_align_distance_to_seq.py [-h] file output
Positional arguments:
file an alignment in the Stokholm format, the first seq will be used to calculate distance to (#TODO pick any seq)
output csv pandas file

rna_align_foldability.py

Calculate statistics of foldability on an alignment.

The tool uses ENTRANA [1] to calculate, what the authors called, foldability (column: “foldability”) of a given sequence into a given secondary structure.

Next, MC-Fold [2] is executed to calculate free energy (column: “mcsym”) on the sequence and the secondary structure obtained based on the alignment. The secondary structure is used as constraints.

The third used program is RNAfold from the Vienna package [3]. Also, in this case the secondary structure obtained with rna-tools from the RNA alignment is used as constraints, columns: “mfe” (minimum free energy), “mfess” (secondary structure for minimum free energy state), “cfe” (minimum free energy of centroid), “cfess” (secondary structure for centroid, “diversity” (ensemble diversity), “efe” (free energy of the thermodynamic ensemble), “efess” (secondary structure for the thermodynamic ensemble), “freq” (frequency of mfe structure in ensemble). RNAfold is also executed in with “–enforceConstraint” where the constraints are enforced. This run gives analogous values as the default RNAfold, to all RNAfold column “_enforce” is added.

The tool is able to calculate the distance Levenshtein (the difference between the two sequences)(column: “distance”) from the target sequence and all sequence in the alignment to test if there is a bias in the accuracy towards the most similar sequences.

Another tool used from the Vienna package is RNAeval. The tool calculates free energy for a given sequence and secondary structure.

The accuracy is expressed as the median of core RMSD of 10% the lowest core RMSD models for the given sequences.

_images/rp17_foldability.png

The correlations:

accuracy             1.000000
cfe                  0.653813
foldability          0.622038
mfe                  0.607340
efe                  0.585077
diversity            0.404350
eval                 0.349499
cfe_enforce          0.311744
mfe_enforce          0.302973
efe_enforce          0.280929
distance             0.256870
freq                 0.037037
diversity_enforce    0.018429
mcsym                0.017533
freq_enforce        -0.037991
length              -0.340809

The data:

We tested correlations between the above-mentioned statistics, and the highest correlation, 0.65 () was achieved to the centroid free energy calculated with RNAFold, which suggests that to some extent this metric could be used to pick sequence from the alignment to pick sequences that are more likely to fold.

However, this needs further investigation and the detailed analysis an all cases and more folded sequences.

  1. Su C, Weir JD, Zhang F, Yan H, Wu T. ENTRNA: a framework to predict RNA foldability. BMC Bioinformatics. BioMed Central 2019
  2. Parisien M, Major F. The MC-Fold and MC-Sym pipeline infers RNA structure from sequence data. Nature 2008;452:51-5
  3. Lorenz R, Bernhart SH, Honer Zu Siederdissen C, Tafer H, Flamm C, Stadler PF, et al. ViennaRNA Package 2.0. Algorithms Mol Biol. BioMed Central; 2011;6:26-14.

Example:

$ python rna_align_foldability.py test_data/gmp_ref.sto test_data/gmp_foldability.csv
                                       cfess_enforce  distance  diversity
0  ((((((.(((......(((((((((..............)))))))...      1.00       3.96
1  (((.(((((((..((......(((((.(((((.......))).......      0.69       5.56
2                                                         0.73       3.84
3                                                         0.69       5.92
4  (((....((((.((((((...((((((((...............))...      0.73       7.49
5  ((((.(((.(((..((..((((((((.................)))...      0.75       7.92
6  .(((((((((.((((.....(((((((((...............))...      0.72       5.83
7                                                         0.72       7.35
8  ...((((((((.(((......((((((.((....(((...)))..)...      0.65       4.86

   diversity_enforce    efe  efe_enforce
0               2.89 -14.77       -13.75
1               3.70 -19.52       -18.25
2               0.00 -15.41         0.00
3               0.00 -13.55         0.00
4               2.46  -8.58        -6.91
5               6.37 -20.72       -20.08
6               2.92 -11.87       -11.38
7               0.00 -14.59         0.00
8               3.83 -21.16       -20.64

                                               efess
0  ((((((..........(((((((((..............)))))))...
1  {{..(((((((..((......(((((.(({((.......}}).......
2  .......((((.(((..(...(((((((((..............))...
3  .....((((((.(((.....(((((((((.......{{...,}..,...
4  .......{(((.,{{{{,...((((((((...............))...
5  {(((.(((.(((..((..((((((((,{{...,.....)}}..)))...
6  .(((((((((.((((.....(((((((((...............))...
7  .....{,.{{{.(((......((((((((((..........,.}))...
8  ...{(((((((.(((......((((((.((....(((...)))..)...

                         ...                         length   mcsym
0                        ...                           75.0  -39.73
1                        ...                           85.0  -37.89
2                        ...                           84.0  -35.40
3                        ...                           86.0  -36.11
4                        ...                           83.0  -37.37
5                        ...                           80.0  -43.59
6                        ...                           84.0  -42.95
7                        ...                           84.0  -36.55
8                        ...                           85.0  -43.58

                      mcsym comment
0  energy best dynamics programming
1                         BP energy
2                         BP energy
3                         BP energy
4  energy best dynamics programming
5  energy best dynamics programming
6  energy best dynamics programming
7                         BP energy
8  energy best dynamics programming

                                            mcsym ss   mfe mfe_enforce
0  ((((((.(((......((((((((................))))))... -13.9       -12.9
1  (((.(((((((..((......(((((.((.................... -18.0       -17.3
2  ....(..((((.(((......((((((((................)... -14.0         0.0
3  .(...((((((.(((......((((((((.................... -12.0         0.0
4  (((....((((.(((......((((((((...............))...  -7.2        -6.1
5  ((((.(((.(((......((((((((.................)))... -18.6       -18.6
6  .(((((((((.(((......((((((((.................)... -10.5       -10.5
7  ...(.((.(((.(((......((((((((................)... -12.8         0.0
8  ...((((((((.(((......((((((.(.................... -19.8       -19.8

                                               mfess
0  ((((((..........(((((((((..............)))))))...
1  ((..(((((((..((......(((((.(((((.......))).......
2  .......((((.(((......(((((((((..............))...
3  .....((((((.(((.....(((((((((....................
4  .....................((((((((...............))...
5  ((((.(((.(((..((..(((((((((((.........)))..)))...
6  .(((((((((.((((.....(((((((((...............))...
7  .....((.(((.(((......((((((((((............)))...
8  ...((((((((.(((......((((((.((....(((...)))..)...

                                       mfess_enforce
0  ((((((.(((......(((((((((..............)))))))...
1  (((.(((((((..((......(((((.(((((.......))).......
2                                              error
3                                              error
4  (((....((((.((((((...((((((((...............))...
5  ((((.(((.(((..((..(((((((((((.........)))..)))...
6  .(((((((((.((((.....(((((((((...............))...
7                                              error
8  ...((((((((.(((......((((((.((....(((...)))..)...

                                                 seq
0  GCGCGGAAACAAUGAUGAAUGGGUUUAAAUUGGGCACUUGACUCAU...
1  CUGUCGAAGAGACGCGAUGAAUCCCGCCCUGUAAUUCGGGCACCUC...
2  AAUCAAUAGGGAAGCAACGAAGCAUAGCCUUUAUAUGGACACUUGG...
3  AAAUAUUAUAGAGAUGUUGAAGUAUAUUCUAUUAUUGGGCACCUUA...
4  AUUUUAAGAGGAAAUUUUGAACUAUAUACUUAUUUGGGCACUUUGU...
5  UGCAAUGGGUGUGAUGAAGUCCGGACAGUAAUGUGGGCACUUAGUC...
6  AAUAUUUUAGAAACUGAGAAGUAUAUCUUAUUAUUGGGCAUCUGGA...
7  AUAACGGCACGAAGCAAUGAAAUGUUCGAUGUAACCGGGCACCUAU...
8  AAAUUAAGGGGAAGCGUUGAGCCGCUACCCAUAUGUGGUUCACUCG...

                                                  ss
0  ((((((.(((......((((((((................))))))...
1  (((.(((((((..((......(((((.((....................
2  ....(..((((.(((......((((((((................)...
3  .(...((((((.(((......((((((((....................
4  (((....((((.(((......((((((((...............))...
5  ((((.(((.(((......((((((((.................)))...
6  .(((((((((.(((......((((((((.................)...
7  ...(.((.(((.(((......((((((((................)...
8  ...((((((((.(((......((((((.(....................

[9 rows x 26 columns]

usage: rna_align_foldability.py [-h] [--all-stars] [--dev] [--skip-mcfold]
                                [-v]
                                file output
Positional arguments:
file an alignment in the Stokholm format, the first seq will be used to calculate distance to (#TODO pick any seq)
output csv pandas file
Options:
--all-stars=False
 this takes usully super long
--dev=False Undocumented
--skip-mcfold=False
 Undocumented
-v=False, --verbose=False
 be verbose

Random assignment of nucleotides

random_assignment_of_nucleotides.py - Random assignment of nucleotides for non-typical characters in the sequence alignment (arg –alignfn or fasta file with sequneces (arg –seqfn)

R = G A (purine)
Y = U C (pyrimidine)
K = G U (keto)
M = A C (amino)
S = G C (strong bonds)
W = A U (weak bonds)
B = G U C (all but A)
D = G A U (all but C)
H = A C U (all but G)
V = G C A (all but T)
N = A G C U (any)

author: A. Zyla - azyla

Warning

Tested only on fasta files! and requires Biopython (tested with v1.68)

usage: random_assignment_of_nucleotides [-h] [-v] [--alignfn ALIGNFN]
                                        [--seqfn SEQFN] [--outfn OUTFN]
Options:
-v=False, --verbose=False
 increase output verbosity
--alignfn alignment in the Fasta format
--seqfn sequences in the Fasta format
--outfn output aln file (default: alnfn .fasta -> _out.fasta)

random_assignment_of_nucleotides.py - Random assignment of nucleotides for non-typical characters in the sequence alignment (arg –alignfn or fasta file with sequneces (arg –seqfn)

R = G A (purine)
Y = U C (pyrimidine)
K = G U (keto)
M = A C (amino)
S = G C (strong bonds)
W = A U (weak bonds)
B = G U C (all but A)
D = G A U (all but C)
H = A C U (all but G)
V = G C A (all but T)
N = A G C U (any)

author: A. Zyla - azyla

Warning

Tested only on fasta files! and requires Biopython (tested with v1.68)

rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.get_align(alignfn)[source]
Get seq from an alignment with gaps.
Args:
alignfn (str): a path to an alignment
Usage::
>>> get_align('test_data/aln1.fasta')
SingleLetterAlphabet() alignment with 2 rows and 13 columns
AGGGGGACAGNYU 1
CYGA------CGG 2
obj1’, description=’obj1’, dbxrefs=[]), id=’<unknown id>’, name=’<unknown name>’, description=’<unknown description>’, dbxrefs=[])
Returns:
alignment
rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.get_parser()[source]
rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.get_sequences(seqfn)[source]

Get seq from an fasta file. :param seqfn: a path to a fasta file :type seqfn: str

Usage::
>>> get_align('test_data/fasta.fasta')
Returns:[SeqRecord(seq=Seq(‘GGGYYGCCNRW’, SingleLetterAlphabet()), id=‘1’, name=‘1’, description=‘1’, dbxrefs=[]), SeqRecord(seq=Seq(‘GGRGYYGCCUURWAA’, SingleLetterAlphabet()), id=‘1’, name=‘1’, description=‘1’, dbxrefs=[])]
rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.write_align(align, outfn)[source]

Write cleaned alignment with Biopython. :param align: a cleaned alignment :type align: obj :param outfn: a path to a new alignment file :type outfn: str

Returns:writes to a file in fasta format
Return type:none
rna_tools.tools.rna_alignment.random_assignment_of_nucleotides.write_seq(seqfn, outfn)[source]

Write cleaned alignment with Biopython. :param align: a cleaned alignment :type align: obj :param outfn: a path to a new alignment file :type outfn: str

Returns:writes to a file in fasta format
Return type:none

CMAlign

class rna_tools.tools.rna_alignment.rna_alignment.CMAlign(outputfn=None)[source]

CMAalign class around cmalign (of Inferal).

cmalign - aligns the RNA sequences in <seqfile> to the covariance model (CM) in <cmfile>. The new alignment is output to stdout in Stockholm format.

Example:

cma = ra.CMAlign()
cma.run_cmalign("ade_seq.fa", "RF00167.cm")
seq = cma.get_seq()
print 'cma hit  ', seq
print 'seq      ', a.align_seq(seq)
print 'a.rf     ', a.rf

cmd cmalign -g RF00167.cm ade_seq.fa

# STOCKHOLM 1.0
#=GF AU Infernal 1.1.2

ade          ----------------CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAG-CCUUAAA-CUCUUGAUUAUGAAGUGA------------
#=GR ade PP  ................99*********************************************.*******.***************999............
#=GC SS_cons :::::::::::::::::((((((((,,,<<<<<<<_______>>>>>>>,,,,,,,,<<<<<<<_______>>>>>>>,,))))))))::::::::::::::
#=GC RF      aaaaaauaaaaaaaauucccuCgUAUAAucccgggAAUAUGGcccgggaGUUUCUACCaggcagCCGUAAAcugccuGACUAcGagggaaauuuuuuuuuuu
//
cma hit   ----------------CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAG-CCUUAAA-CUCUUGAUUAUGAAGUGA------------
seq       ----------------CGCU-U-CAUAUAAUCCUAAUGAUAUGG-UUUGGGA-GUUUCUACCAAGAG-CC--UUAAA-CUCUU---GAUUAUG-AAGUGA-------------
a.rf      aaaaaauaaaaaaaauuccc.u.CgUAUAAucccgggAAUAUGG.cccggga.GUUUCUACCaggcagCC..GUAAAcugccu...GACUAcG.agggaaauuuuuuuuuuu.

Install http://eddylab.org/infernal/

Cite: Nawrocki and S. R. Eddy, Infernal 1.1: 100-fold faster RNA homology searches, Bioinformatics 29:2933-2935 (2013).

get_gc_rf()[source]

Get #=GC RF.

Variables:self.output – string
get_seq()[source]
Variables:self.output – output of cmalign, string
run_cmalign(seq, cm, verbose=True)[source]

Run cmalign and process the result.

Parameters:
  • seq – seq string
  • cm – cm fn

Run:

$ cmalign RF01831.cm 4lvv.seq
# STOCKHOLM 1.0
#=GF AU Infernal 1.1.2

4lvv         -GGAGAGUA-GAUGAUUCGCGUUAAGUGUGUGUGA-AUGGGAUGUCG-UCACACAACGAAGC---GAGA---GCGCGGUGAAUCAUU-GCAUCCGCUCCA
#=GR 4lvv PP .********.******************9999998.***********.8999999******8...5555...8**************.************
#=GC SS_cons (((((----(((((((((((,,,,,<<-<<<<<<<<___________>>>>>>>>>>,,,<<<<______>>>>,,,)))))))))))-------)))))
#=GC RF      ggcaGAGUAGggugccgugcGUuAAGUGccggcgggAcGGGgaGUUGcccgccggACGAAgggcaaaauugcccGCGguacggcaccCGCAUcCgCugcc
//

Warning

requires cmalign to be set in your shell

RChie

class rna_tools.tools.rna_alignment.rna_alignment.RChie[source]

RChie - plotting arc diagrams of RNA secondary structures.

_images/rchie.png

http://www.e-rna.org/r-chie/

The offline version of R-chie, which requires first installing R4RNA is available here, or clone our git repository here How to install it:

  • Ensure R is installed already, or download it freely from http://www.r-project.org/

  • Download the R4RNA (https://github.com/jujubix/r-chie), open R and install the package:

    install.packages("<path_to_file>/R4RNA", repos = NULL, type="source")
    # Install the optparse and RColorBrewer
    install.packages('optparse')
    install.packages('RColorBrewer')
    
  • Go to rna_tools/rna_tools_config_local.py and set RCHIE_PATH to the folder with RChie, e.g. "/home/magnus/work/opt/r-chie/".

To test if Rchie works on your machine (from rna_align folder):

<path to your rchie>/rchie.R --msafile test_data/rchie_test_files/fasta.txt test_data/rchie_test_files/helix.txt

you should have rchie.png file in the folder.

More at http://www.e-rna.org/r-chie/download.cgi

Cite: Daniel Lai, Jeff R. Proctor, Jing Yun A. Zhu, and Irmtraud M. Meyer (2012) R-chie: a web server and R package for visualizing RNA secondary structures. Nucleic Acids Research, first published online March 19, 2012. doi:10.1093/nar/gks241

plot_cov(seqs, ss_cons, plot_fn='rchie.png', verbose=False)[source]

Plot an RChie plot_conv.

Parameters:
  • seqs – seqs in the fasta format
  • ss_cons – a string of secondary structure consensus, use only ().. Works with pseuoknots.
show()[source]
write(outfn)[source]

Renumber a pdb file according to alignment

renum_pdb_to_aln.py - renumber a pdb file based on the alignment.

author: A. Zyla under supervision of mmagnus

Warning

works only for single chain! and requires Biopython (tested with v1.68)

usage: renum_to_aln [-h] [-v] [--residue_index_start RESIDUE_INDEX_START]
                    [--outfn OUTFN]
                    seqid alignfn pdbfn
Positional arguments:
seqid seq id in the alignemnt
alignfn alignemnt in the Fasta format
pdbfn pdb file
Options:
-v=False, --verbose=False
 increase output verbosity
--residue_index_start=1
 renumber starting number (default: 1)
--outfn output pdb file (default: pdbfn .pdb -> _out.pdb)

renum_pdb_to_aln.py - renumber a pdb file based on the alignment.

author: A. Zyla under supervision of mmagnus

Warning

works only for single chain! and requires Biopython (tested with v1.68)

rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.get_parser()[source]
rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.get_seq(alignfn, seqid)[source]

Get seq from an alignment with gaps.

Parameters:
  • alignfn (str) – a path to an alignment
  • seqid (str) – seq id in an alignment

Usage:

>>> get_seq('test_data/ALN_OBJ1_OBJ2.fa', 'obj1')
SeqRecord(seq=SeqRecord(seq=Seq('GUUCAG-------------------UGAC-', SingleLetterAlphabet()), id='obj1', name='obj1', description='obj1', dbxrefs=[]), id='<unknown id>', name='<unknown name>', description='<unknown description>', dbxrefs=[])
Returns:SeqRecord
rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.open_pdb(pdbfn)[source]

Open pdb with Biopython.

Parameters:pdbfn (str) – a path to a pdb structure
Returns:with a pdb structure
Return type:PDB Biopython object
rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.renumber(seq_with_gaps, struc, residue_index_start)[source]

Renumber a pdb file.

Parameters:
  • seq_with_gaps (str) – a target sequence extracted from the alignment
  • struc (pdb) – a structure
  • residue_index_start (int) – starting number
Returns:

BioPython Structure object

rna_tools.tools.renum_pdb_to_aln.renum_pdb_to_aln.write_struc(struc, outfn)[source]

Write renumbered pdb with Biopython.

Parameters:
  • struc (pdb) – a renumbered structure
  • outfn (str) – a path to a new, renumbered pdb file
Returns:

writes to a file

Return type:

none

Root Mean Square Deviation (RMSD)

rna_calc_rmsd

rna_calc_rmsd.py - calculate RMSDs of structures to the target

If you still have problem with various number of atoms, check out this issue: get_rnapuzzle_ready: how to deal with `Alternate location indicator (https://github.com/mmagnus/rna-pdb-tools/issues/30).

The program is using (https://github.com/charnley/rmsd).

Example #1:

$ rna_calc_rmsd.py -t 6_0_solution_4GXY_rpr.pdb --model_selection=A:1-17+24-110+115-168 *.pdb
rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
method: all-atom-built-in
# of models: 35
6_0_solution_4GXY_rpr.pdb 0.0 3409
6_Blanchet_1_rpr.pdb 22.31 3409
6_Blanchet_2_rpr.pdb 21.76 3409
6_Blanchet_3_rpr.pdb 21.32 3409
6_Blanchet_4_rpr.pdb 22.22 3409
6_Blanchet_5_rpr.pdb 24.17 3409
6_Blanchet_6_rpr.pdb 23.28 3409
6_Blanchet_7_rpr.pdb 22.26 3409
6_Bujnicki_1_rpr.pdb 36.95 3409
6_Bujnicki_2_rpr.pdb 30.9 3409
6_Bujnicki_3_rpr.pdb 32.1 3409
6_Bujnicki_4_rpr.pdb 32.04 3409
...

Example #2:

time rmsd_calc_to_target.py
  -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
  --target_selection A:1-48+52-63
  --model_selection A:1-48+52-63
  --target_ignore_selection A/57/O2'
  clusters/*_AA.pdb

rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
  target_selection:  A:1-48+52-63
  model_selection:   A:1-48+52-63
  target_ignore_selection:  A/57/O2'
  model_ignore_selection:
# of models: 801
fn,rmsd_all
pistol_thrs0.50A_clust01-000001_AA.pdb,7.596
pistol_thrs0.50A_clust02-000001_AA.pdb,7.766
pistol_thrs0.50A_clust03-000001_AA.pdb,18.171
[..]
pistol_thrs0.50A_clust799-000001_AA.pdb,5.356
pistol_thrs0.50A_clust800-000001_AA.pdb,15.282
pistol_thrs0.50A_clust801-000001_AA.pdb,16.339
# of atoms used: 1237
csv was created!  rmsds.csv
rmsd_calc_to_target.py -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
37.93s user 1.07s system 87% cpu 44.650 total

usage: rna_calc_rmsd [-h] [-t TARGET_FN] [--target-selection TARGET_SELECTION]
                     [--target-ignore-selection TARGET_IGNORE_SELECTION]
                     [--model-selection MODEL_SELECTION]
                     [--model-ignore-selection MODEL_IGNORE_SELECTION]
                     [-m METHOD] [-o RMSDS_FN] [-v] [--target-column-name]
                     files [files ...]
Positional arguments:
files files
Options:
-t=, --target-fn=
 pdb file
--target-selection=
 selection, e.g. A:10-16+20, where #16 residue is included
--target-ignore-selection=
 A/10/O2’
--model-selection=
 selection, e.g. A:10-16+20, where #16 residue is included
--model-ignore-selection=
 A/10/O2’
-m=all-atom-built-in, --method=all-atom-built-in
 align, fit
-o=rmsds.csv, --rmsds-fn=rmsds.csv
 ouput, matrix
-v=False, --verbose=False
 verbose
--target-column-name=False
 Undocumented

rna_calc_rmsd.py - calculate RMSDs of structures to the target

If you still have problem with various number of atoms, check out this issue: get_rnapuzzle_ready: how to deal with `Alternate location indicator (https://github.com/mmagnus/rna-pdb-tools/issues/30).

The program is using (https://github.com/charnley/rmsd).

Example #1:

$ rna_calc_rmsd.py -t 6_0_solution_4GXY_rpr.pdb --model_selection=A:1-17+24-110+115-168 *.pdb
rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
method: all-atom-built-in
# of models: 35
6_0_solution_4GXY_rpr.pdb 0.0 3409
6_Blanchet_1_rpr.pdb 22.31 3409
6_Blanchet_2_rpr.pdb 21.76 3409
6_Blanchet_3_rpr.pdb 21.32 3409
6_Blanchet_4_rpr.pdb 22.22 3409
6_Blanchet_5_rpr.pdb 24.17 3409
6_Blanchet_6_rpr.pdb 23.28 3409
6_Blanchet_7_rpr.pdb 22.26 3409
6_Bujnicki_1_rpr.pdb 36.95 3409
6_Bujnicki_2_rpr.pdb 30.9 3409
6_Bujnicki_3_rpr.pdb 32.1 3409
6_Bujnicki_4_rpr.pdb 32.04 3409
...

Example #2:

time rmsd_calc_to_target.py
  -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
  --target_selection A:1-48+52-63
  --model_selection A:1-48+52-63
  --target_ignore_selection A/57/O2'
  clusters/*_AA.pdb

rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
  target_selection:  A:1-48+52-63
  model_selection:   A:1-48+52-63
  target_ignore_selection:  A/57/O2'
  model_ignore_selection:
# of models: 801
fn,rmsd_all
pistol_thrs0.50A_clust01-000001_AA.pdb,7.596
pistol_thrs0.50A_clust02-000001_AA.pdb,7.766
pistol_thrs0.50A_clust03-000001_AA.pdb,18.171
[..]
pistol_thrs0.50A_clust799-000001_AA.pdb,5.356
pistol_thrs0.50A_clust800-000001_AA.pdb,15.282
pistol_thrs0.50A_clust801-000001_AA.pdb,16.339
# of atoms used: 1237
csv was created!  rmsds.csv
rmsd_calc_to_target.py -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
37.93s user 1.07s system 87% cpu 44.650 total
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd(a, b, target_selection, target_ignore_selection, model_selection, model_ignore_selection, verbose)[source]

a is model b is target

Params:a = filename of structure a
Params:b = filename of structure b
Returns:rmsd, number of atoms
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd_pymol(pdb1, pdb2, method)[source]

Calculate rmsd using PyMOL. Two methods are available: align and fit

See:

Align can return a list with 7 items:

RMSD after refinement Number of aligned atoms after refinement Number of refinement cycles RMSD before refinement Number of aligned atoms before refinement Raw alignment score Number of residues aligned

in this version of function, the function returns RMSD before refinement.

Install on OSX: brew install homebrew/science/pymol and set PYTHONPATH to your PyMOL packages, .e.g

PYTHONPATH=$PYTHONPATH:/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages

If problem:

Match-Error: unable to open matrix file '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/data/pymol/matrices/BLOSUM62'.

then define PYMOL_PATH in your .bashrc, e.g.:

export PYMOL_PATH=/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymol/
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.get_parser()[source]
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.get_rna_models_from_dir(files)[source]
Parameters:models – a list of filenames

Example of the list:

['test_data/rp17/2_restr1_Michal1.pdb_clean.pdb', 'test_data/rp17/2a_nonrestr2_Michal1.pdb_clean.pdb',
'test_data/rp17/3_nonrestr1_Michal1.pdb_clean.pdb', 'test_data/rp17/5_restr1_Michal3.pdb_clean.pdb']
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.sort_nicely(l)[source]

Sort the given list in the way that humans expect.

http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

rna_calc_rmsd

rna_calc_rmsd.py - calculate RMSDs of structures to the target

If you still have problem with various number of atoms, check out this issue: get_rnapuzzle_ready: how to deal with `Alternate location indicator (https://github.com/mmagnus/rna-pdb-tools/issues/30).

The program is using (https://github.com/charnley/rmsd).

Example #1:

$ rna_calc_rmsd.py -t 6_0_solution_4GXY_rpr.pdb --model_selection=A:1-17+24-110+115-168 *.pdb
rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
method: all-atom-built-in
# of models: 35
6_0_solution_4GXY_rpr.pdb 0.0 3409
6_Blanchet_1_rpr.pdb 22.31 3409
6_Blanchet_2_rpr.pdb 21.76 3409
6_Blanchet_3_rpr.pdb 21.32 3409
6_Blanchet_4_rpr.pdb 22.22 3409
6_Blanchet_5_rpr.pdb 24.17 3409
6_Blanchet_6_rpr.pdb 23.28 3409
6_Blanchet_7_rpr.pdb 22.26 3409
6_Bujnicki_1_rpr.pdb 36.95 3409
6_Bujnicki_2_rpr.pdb 30.9 3409
6_Bujnicki_3_rpr.pdb 32.1 3409
6_Bujnicki_4_rpr.pdb 32.04 3409
...

Example #2:

time rmsd_calc_to_target.py
  -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
  --target_selection A:1-48+52-63
  --model_selection A:1-48+52-63
  --target_ignore_selection A/57/O2'
  clusters/*_AA.pdb

rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
  target_selection:  A:1-48+52-63
  model_selection:   A:1-48+52-63
  target_ignore_selection:  A/57/O2'
  model_ignore_selection:
# of models: 801
fn,rmsd_all
pistol_thrs0.50A_clust01-000001_AA.pdb,7.596
pistol_thrs0.50A_clust02-000001_AA.pdb,7.766
pistol_thrs0.50A_clust03-000001_AA.pdb,18.171
[..]
pistol_thrs0.50A_clust799-000001_AA.pdb,5.356
pistol_thrs0.50A_clust800-000001_AA.pdb,15.282
pistol_thrs0.50A_clust801-000001_AA.pdb,16.339
# of atoms used: 1237
csv was created!  rmsds.csv
rmsd_calc_to_target.py -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
37.93s user 1.07s system 87% cpu 44.650 total

usage: rna_calc_rmsd [-h] [-t TARGET_FN] [--target-selection TARGET_SELECTION]
                     [--target-ignore-selection TARGET_IGNORE_SELECTION]
                     [--model-selection MODEL_SELECTION]
                     [--model-ignore-selection MODEL_IGNORE_SELECTION]
                     [-m METHOD] [-o RMSDS_FN] [-v] [--target-column-name]
                     files [files ...]
Positional arguments:
files files
Options:
-t=, --target-fn=
 pdb file
--target-selection=
 selection, e.g. A:10-16+20, where #16 residue is included
--target-ignore-selection=
 A/10/O2’
--model-selection=
 selection, e.g. A:10-16+20, where #16 residue is included
--model-ignore-selection=
 A/10/O2’
-m=all-atom-built-in, --method=all-atom-built-in
 align, fit
-o=rmsds.csv, --rmsds-fn=rmsds.csv
 ouput, matrix
-v=False, --verbose=False
 verbose
--target-column-name=False
 Undocumented

rna_calc_rmsd.py - calculate RMSDs of structures to the target

If you still have problem with various number of atoms, check out this issue: get_rnapuzzle_ready: how to deal with `Alternate location indicator (https://github.com/mmagnus/rna-pdb-tools/issues/30).

The program is using (https://github.com/charnley/rmsd).

Example #1:

$ rna_calc_rmsd.py -t 6_0_solution_4GXY_rpr.pdb --model_selection=A:1-17+24-110+115-168 *.pdb
rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
method: all-atom-built-in
# of models: 35
6_0_solution_4GXY_rpr.pdb 0.0 3409
6_Blanchet_1_rpr.pdb 22.31 3409
6_Blanchet_2_rpr.pdb 21.76 3409
6_Blanchet_3_rpr.pdb 21.32 3409
6_Blanchet_4_rpr.pdb 22.22 3409
6_Blanchet_5_rpr.pdb 24.17 3409
6_Blanchet_6_rpr.pdb 23.28 3409
6_Blanchet_7_rpr.pdb 22.26 3409
6_Bujnicki_1_rpr.pdb 36.95 3409
6_Bujnicki_2_rpr.pdb 30.9 3409
6_Bujnicki_3_rpr.pdb 32.1 3409
6_Bujnicki_4_rpr.pdb 32.04 3409
...

Example #2:

time rmsd_calc_to_target.py
  -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
  --target_selection A:1-48+52-63
  --model_selection A:1-48+52-63
  --target_ignore_selection A/57/O2'
  clusters/*_AA.pdb

rmsd_calc_rmsd_to_target
--------------------------------------------------------------------------------
  target_selection:  A:1-48+52-63
  model_selection:   A:1-48+52-63
  target_ignore_selection:  A/57/O2'
  model_ignore_selection:
# of models: 801
fn,rmsd_all
pistol_thrs0.50A_clust01-000001_AA.pdb,7.596
pistol_thrs0.50A_clust02-000001_AA.pdb,7.766
pistol_thrs0.50A_clust03-000001_AA.pdb,18.171
[..]
pistol_thrs0.50A_clust799-000001_AA.pdb,5.356
pistol_thrs0.50A_clust800-000001_AA.pdb,15.282
pistol_thrs0.50A_clust801-000001_AA.pdb,16.339
# of atoms used: 1237
csv was created!  rmsds.csv
rmsd_calc_to_target.py -t 5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
37.93s user 1.07s system 87% cpu 44.650 total
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd(a, b, target_selection, target_ignore_selection, model_selection, model_ignore_selection, verbose)[source]

a is model b is target

Params:a = filename of structure a
Params:b = filename of structure b
Returns:rmsd, number of atoms
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd_pymol(pdb1, pdb2, method)[source]

Calculate rmsd using PyMOL. Two methods are available: align and fit

See:

Align can return a list with 7 items:

RMSD after refinement Number of aligned atoms after refinement Number of refinement cycles RMSD before refinement Number of aligned atoms before refinement Raw alignment score Number of residues aligned

in this version of function, the function returns RMSD before refinement.

Install on OSX: brew install homebrew/science/pymol and set PYTHONPATH to your PyMOL packages, .e.g

PYTHONPATH=$PYTHONPATH:/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages

If problem:

Match-Error: unable to open matrix file '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/data/pymol/matrices/BLOSUM62'.

then define PYMOL_PATH in your .bashrc, e.g.:

export PYMOL_PATH=/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymol/
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.get_parser()[source]
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.get_rna_models_from_dir(files)[source]
Parameters:models – a list of filenames

Example of the list:

['test_data/rp17/2_restr1_Michal1.pdb_clean.pdb', 'test_data/rp17/2a_nonrestr2_Michal1.pdb_clean.pdb',
'test_data/rp17/3_nonrestr1_Michal1.pdb_clean.pdb', 'test_data/rp17/5_restr1_Michal3.pdb_clean.pdb']
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd.sort_nicely(l)[source]

Sort the given list in the way that humans expect.

http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

rna_calc_rmsd_multi_targets

# run for one do it for all # merge tables and calculcate statiics

# run for one do it for all # merge tables and calculcate statiics

usage: rna_calc_evo_rmsd [-h] [-v] [--models MODELS [MODELS ...]]
                         [--targets TARGETS [TARGETS ...]]
                         [--output-csv OUTPUT_CSV]
                         [--model-selection MODEL_SELECTION]
                         [--target-selection TARGET_SELECTION]
Options:
-v=False, --verbose=False
 be verbose
--models Undocumented
--targets Undocumented
--output-csv=rna_calc_rmsd_multi_targets_output.csv
 Undocumented
--model-selection=
 selection, e.g. A:10-16+20, where #16 residue is included
--target-selection=
 selection, e.g. A:10-16+20, where #16 residue is included

rna_calc_rmsd_trafl

rmsd_calc_trafl - calculate RMSD of transition A->B based on a SimRNA trajectory

After this script, run:

rna_cal_rmsd_trafl_plot.py rmsd.txt

to get a plot like this:

_images/rmsd_transition.png

Prepare structures:

$ SimRNA -p 17_Das_2_rpr.pdb -n 0 -o 17_Das_2_rpr_n0 # no trafl, trafl will be added
$ SimRNA -p 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped.pdb -n 0 -o 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped
#(struc must be (~CG~) nope. It has to be a trajectory!)

and run:

 $ rmsd_calc_trafl.py 17_Das_2_rpr.pdb.trafl 17_Das_2_rpr_n0.trafl 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl rp17_rmsd.txt
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl 17_Das_2_rpr.pdb_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e
 > struc: 17_Das_2_rpr_n0.trafl 2
 > trafl: 17_Das_2_rpr.pdb.trafl 48
 % saved: 17_Das_2_rpr.pdb.trafl_17_Das_2_rpr_n0.trafl
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl_17_Das_2_rpr_n0.trafl 17_Das_2_rpr.pdb_rmsd_e_17_Das_2_rpr_n0_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e_17_Das_2_rpr_n0_rmsd_e
 > struc: 5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl 2
 > trafl: 17_Das_2_rpr.pdb.trafl 48
 % saved: 17_Das_2_rpr.pdb.trafl_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl
 > calc_rmsd_to_1st_frame
   /Users/magnus/work/opt/simrna/SimRNA_64bitIntel_MacOSX_staticLibs/calc_rmsd_to_1st_frame 17_Das_2_rpr.pdb.trafl_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0.trafl 17_Das_2_rpr.pdb_rmsd_e_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0_rmsd_e
 < rmsd_out: 17_Das_2_rpr.pdb_rmsd_e_5k7c_clean_onechain_renumber_as_puzzle_rpr_rmGapped_n0_rmsd_e
  0.000 -695.634
  0.000 -551.093
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
 < out: rp17_rmsd.txt

Warning

calc_rmsd_to_1st_frame (SimRNA) is required and the path to the binary file is defined in config_local.

usage: rna_calc_evo_rmsd [-h] trafl struc1 struc2 rmsds_fn
Positional arguments:
trafl trafil
struc1 structure A
struc2 structure B
rmsds_fn output file

rna_cal_rmsd_trafl_plot - generate a plot based of <rmsd.txt> of rna_calc_evo_rmsd.py.

usage: rna_cal_rmsd_trafl_plot [-h] file
Positional arguments:
file rmsd.txt

rna_calc_rmsd_all_vs_all

rna_calc_rmsd_all_vs_all.py - calculate RMSDs all vs all.

Examples:

rna_calc_rmsd_all_vs_all.py -i test_data -o test_output/rmsd_calc_dir.tsv
 # of models: 4
... 1 test_data/struc1.pdb
... 2 test_data/struc2.pdb
... 3 test_data/struc3.pdb
... 4 test_data/struc4.pdb

The program is using (https://github.com/charnley/rmsd)

usage: rna_calc_rmsd_all_vs_all [-h] [-i INPUT_DIR] [-o MATRIX_FN]
Options:
-i=, --input_dir=
 input folder with structures
-o=matrix.txt, --matrix_fn=matrix.txt
 ouput, matrix

rna_calc_rmsd_all_vs_all.py - calculate RMSDs all vs all.

Examples:

rna_calc_rmsd_all_vs_all.py -i test_data -o test_output/rmsd_calc_dir.tsv
 # of models: 4
... 1 test_data/struc1.pdb
... 2 test_data/struc2.pdb
... 3 test_data/struc3.pdb
... 4 test_data/struc4.pdb

The program is using (https://github.com/charnley/rmsd)

rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.calc_rmsd(a, b)[source]

Calc rmsd.

rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.get_parser()[source]
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.get_rna_models_from_dir(directory)[source]
rna_tools.tools.rna_calc_rmsd.rna_calc_rmsd_all_vs_all.sort_nicely(l)[source]

Sort the given list in the way that humans expect. http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

Interaction Network Fidelity (INF)

rna_calc_inf

usage: rna_calc_inf [-h] [-t TARGET_FN] [-m NT] [-s SS] [--stacking] [--debug]
                    [--method METHOD] [-f] [-v] [-o OUT_FN]
                    files [files ...]
Positional arguments:
files files, .e.g folder_with_pdbs/*pdbs
Options:
-t=, --target_fn=
 pdb file
-m=3, --number_of_threads=3
 number of threads used for multiprocessing, if 1 then mp is not used (useful for debugging)!
-s=, --ss= A:(([[))]]
--stacking=False
 take into account also stacking
--debug=False Undocumented
--method=clarna
 you can use mcannotate or clarna
-f=False, --force=False
 force to run ClaRNA even if <pdb>.outCR file is there
-v=False, --verbose=False
 be verbose, tell me more what’re doing
-o=inf.csv, --out_fn=inf.csv
 out csv file, be default `inf.csv`

A tool to calc inf_all, inf_stack, inf_WC, inf_nWC, SNS_WC, PPV_WC, SNS_nWC, PPV_nWC between two structures.

Mind, that ClaRNA is pretty slow, it takes even a few seconds to analyze a structure, so for, say, 1000 models you need a few hours.

How to make it faster? First, you can use --number_of_threads to specify the number of cores used for multiprocessing.

Second, the procedure implemented in here is composed of two steps, first for each structure ClaRNA is used to generate an output with contacts, then these files are used for comparisons. So, if you want to re-run your analysis, you don’t have to run re-run ClaRNA itself. Thus, be default ClaRNA is not executed if <model>.outCR is found next to the analyzed files. To change this behavior force (--force) rna_cal_inf.py to re-run ClaRNA.

ClaRNA_play required! https://gitlab.genesilico.pl/RNA/ClaRNA_play (internal GS gitlab server). Contact <magnus@genesilico.pl>.

import progressbar (in version 2) is required!

rna_tools.tools.rna_calc_inf.rna_calc_inf.do_job(i)[source]

Run ClaRNA & Compare, add 1 to the counter, write output to csv file (keeping it locked)

rna_tools.tools.rna_calc_inf.rna_calc_inf.get_parser()[source]

rna_calc_dinf

Obtain a list of interaction in an RNA molecule where “Interaction” is purely distance based (defined by –cuf-off). Later, you can use it to calculate distance based INF, dINF ;-).

Example:

[mm] rna_calc_inf$ git:(master)$ ./rna_calc_dinf.py  test_output/1Y26.pdb
X    13   X   14          bp   G   C                  WW_cis   1
X    13   X   83          bp   G   C                  WW_cis   1
X    13   X   82          bp   U   C                  WW_cis   1
X    14   X   15          bp   C   G                  WW_cis   1
X    14   X   83          bp   G   G                  WW_cis   1
X    14   X   81          bp   G   G                  WW_cis   1
X    14   X   82          bp   U   G                  WW_cis   1

use clarna_compare.py:

[mm] rna_calc_inf$ ./rna_calc_dinf.py  test_output/1Y26.pdb > 1Y26.pdb.outCR
[mm] rna_calc_inf$ clarna_compare.py -iref 1Y26.pdb.outCR -ichk 1Y26.pdb.outCR
1Y26.pdb.outCR                               1Y26.pdb.outCR      1.000      0.000      1.000      1.000      1.000      1.000      1.000      1.000

You can use -d to get a list of all interacting bases, something like:

draw_dists([(13, 14),(13, 83),...(82, 83)])

so you can plot all interacting bases:

_images/1y26_dinf.png

Mind, that draw_dists works on C2 atoms, that might be different from atoms detected with the program (e.g. different base atom could be detected to make an interaction).

_images/1y26_dinf2.png

usage: rna_calc_inf [-h] [-d] [-c] [-v] file
Positional arguments:
file a PDB file
Options:
-d=False, --draw-dists=False
 Undocumented
-c=5, --cut-off=5
 Undocumented
-v=False, --verbose=False
 be verbose

RNA filter (DCA)

rna_filter.py - calculate distances based on given restrants on PDB files or SimRNA trajectories

rna_filter.py - calculate distances based on given restrants on PDB files or SimRNA trajectories.

Changes: weight is always 1 (at least for now). ,>,=,>=,<= .

[PREVIOUS DOCUMENTATION - TO BE REMOVED]

rna_filter.py -s 4gxy_rpr.pdb -r rp06_MohPairs.rfrestrs d:A5-A42 100.0 measured: 26.7465763417 [x] d:A11-A26 100.0 measured: 19.2863696104 [x]

[mm] rp06$ git:(master) $ rna_filter.py -s 4gxy_rpr.pdb -r rp06_MohPairs.rfrestrs
d:A5-A42 100.0 measured: 26.7465763417 [x] d:A11-A26 100.0 measured: 19.2863696104 [x]
Traceback (most recent call last):
File “/home/magnus/work-src/rna-pdb-tools/bin/rna_filter.py”, line 270, in <module>
calc_scores_for_pdbs(args.structures, restraints, args.verbose)
File “/home/magnus/work-src/rna-pdb-tools/bin/rna_filter.py”, line 221, in calc_scores_for_pdbs
dist = get_distance(residues[h[0]][‘mb’], residues[h[1]][‘mb’])

KeyError: ‘A24’

correct, there is no A24 in this structure:

The format of restraints:

(d:A1-A2 < 10.0 1) = if distance between A1 and A2 lower than 10.0, score it with 1

Usage:

$ python rna_filter.py -r test_data/restraints.txt -s test_data/CG.pdb
 d:A1-A2 10.0 measured: 6.58677550096 [x]
test_data/CG.pdb 1.0 1 out of 1

# $ python rna_filter.py -r test_data/restraints.txt -t test_data/CG.trafl
(d:A1-A2 <  10.0  1)|(d:A2-A1 <= 10 1)
 restraints [('A1', 'A2', '<', '10.0', '1'), ('A2', 'A1', '<=', '10', '1')]

Frame #1 e:1252.26
  mb for A1 [ 54.729   28.9375  41.421 ]
  mb for A2 [ 55.3425  35.3605  42.7455]
   d:A1-A2 6.58677550096
  mb for A2 [ 55.3425  35.3605  42.7455]
  mb for A1 [ 54.729   28.9375  41.421 ]
   d:A2-A1 6.58677550096
# this ^ is off right now

usage: rna_filter.py [-h] -r RESTRAINTS_FN [-v]
                     [-s STRUCTURES [STRUCTURES ...]] [--offset OFFSET]
                     [-t TRAJECTORY]
Options:
-r, --restraints_fn
 restraints_fn: Format: (d:A9-A41 < 10.0 1)|(d:A41-A9 <= 10 1)
-v=False, --verbose=False
 be verbose
-s structures
--offset=0 use offset to adjust your restraints to numbering in PDB files, ade (1y26)pdb starts with 13, so offset is -12)
-t SimRNA trajectory

rna_dca_mapping.py

usage: rna_dca_mapping.py [-h] --seq SEQ --gseq GSEQ --dca DCA
                          [--offset OFFSET] [--noss] [--mss] [--verbose]
                          [--noshort]
Options:
--seq seq fn in Fasta format
--gseq gapped sequence and secondary structure (like in the alignment used for DCA) in Fasta format
--dca file with parsed interactions
--offset offset
--noss=False filter out ss from plot
--mss=False ss every each line
--verbose=False
 be verbose
--noshort=False
 filter out short interactions, dist in seq < 6 nt

show_dists - show distances in PyMOL

show_dists - show distances in PyMOL

_images/getdists.png

Usage:

PyMOL>show_dists([[1,2]])
1, 2, 3.41

rna_ex2x.py - analyze an evolutionary coupling file. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~`

rna_ex2x.py - analyze an evolutionary coupling file.

Files can be downloaded from https://marks.hms.harvard.edu/ev_rna/, e.g. RF00167.EC.interaction.csv

--pairs:

$ rna_ex2x.py RF00167.EC.interaction_LbyN.csv --pairs
[18, 78],[31, 39],[21, 75],[30, 40],[28, 42],[27, 43],[59, 67],[54, 72],[57, 69],[25, 45],[29, 41],[17, 79],[26, 44],[16, 80],[14, 82],[19, 77],[55, 71],[
    15, 81],[34, 63],[56, 70],[58, 68],[35, 63],[26, 45],[35, 64],[32, 39],[54, 73],[24, 74],[16, 82],[24, 45],[24, 43],[32, 36],[25, 48],[48, 82],[36, 48],

usage: rna_ec2x.py [-h] [--sep SEP] [--chain CHAIN] [--ec-pairs]
                   [--ss-pairs SS_PAIRS] [--pairs-delta]
                   interaction_fn
Positional arguments:
interaction_fn interaction file
Options:
--sep=, separator
--chain=A chain
--ec-pairs=False
 Undocumented
--ss-pairs file with secondary structure base pairs
--pairs-delta=False
 delta: ec-bp - ss-paris

rna_pairs2SimRNArestrs.py - convert pairs to SimRNA restraints

rna_pairs2SimRNArestrs.py - convert pairs to SimRNA restraints

Example:

$ rna_pairs2SimRNArestrs.py rp06_pairs_delta.txt -v
# of pairs: 42
SLOPE A/2/MB A/172/MB 0 6 1
SLOPE A/2/MB A/172/MB 0 7 -1
SLOPE A/3/MB A/169/MB 0 6 1
SLOPE A/3/MB A/169/MB 0 7 -1
SLOPE A/12/MB A/32/MB 0 6 1

usage: rna_pairs2SimRNArestrs.py [-h] [--offset OFFSET] [--weight WEIGHT]
                                 [--dist DIST] [--well] [-v]
                                 pairs
Positional arguments:
pairs a file with [[2, 172], [3, 169], [12, 32], [13, 31]]
Options:
--offset=0 can be -10
--weight=3 weight
--dist=7 distances, for MOHCA use 25
--well=False well instead of slope
-v=False, --verbose=False
 be verbose

rna_ss_get_bps.py - get a list of base pairs for a given “fasta ss” file.

rna_ss_get_bps.py - get a list of base pairs for a given “fasta ss” file.

Input file:

cat ade_ss.fa
>1y26
CGCUUCAUAUAAUCCUAAUGAUAUGGUUUGGGAGUUUCUACCAAGAGCCUUAAACUCUUGAUUAUGAAGUG
(((((((((...((((((.........))))))........((((((.......))))))..)))))))))%

Usage:

$ rna_ss_get_bps.py ade_ss.fa --offset 12
[[13, 83], [14, 82], [15, 81], [16, 80], [17, 79], [18, 78], [19, 77], [20, 76], [21, 75], [25, 45], [26, 44], [27, 43], [28, 42], [29, 41], [30, 40], [54, 72], [55, 71], [56, 70], [57, 69], [58, 68], [59, 67]]

Now it also work with pseudoknots.

usage: rna_ss_get_bps [-h] [--offset OFFSET] [-v] file
Positional arguments:
file file in the Fasta format
Options:
--offset offset
-v, --verbose be verbose

rna_pairs_diff.py - get a diff of pairs

rna_pairs_diff.py - get a diff of pairs

Usage:

$ rna_pairs_diff.py pistol_dca_all.bp pistol.bp
# of ec_paris: 31
# of ssbps   : 18
dalta#       : 13
[[4, 32], [6, 9], [6, 36], [6, 39], [9, 39], [13, 32], [16, 17], [17, 18], [22, 49], [29, 58]]

usage: rna_pairs_diff.py [-h] [-v] pairs1 pairs2
Positional arguments:
pairs1 a list of pairs, A
pairs2 a list of pairs to subtract, A-B, results in C(all pairs that are in A and are not in B
Options:
-v, --verbose be verbose

Contacts classification & secondary structure detection

3DNA (contacts classification & secondary structure detection)

Python parser to 3dna <http://x3dna.org/>.

Installation:

# install the code from http://forum.x3dna.org/downloads/3dna-download/
Create a copy of the rna_x3dna_config_local_sample.py (remove "_sample") present in rna-tools/rna_tools/tools/rna_x3dna folder.
Edit this line :
BINARY_PATH = <path to your x3dna-dssr file>
matching the path with the path of your x3dna-dssr file.
e.g. in my case: BINARY_PATH = ~/bin/x3dna-dssr.bin

For one structure you can run this script as:

[mm] py3dna$ git:(master) ✗ ./rna_x3dna.py test_data/1xjr.pdb
test_data/1xjr.pdb
>1xjr nts=47 [1xjr] -- secondary structure derived by DSSR
gGAGUUCACCGAGGCCACGCGGAGUACGAUCGAGGGUACAGUGAAUU
..(((((((...((((.((((.....))..))..))).).)))))))

For multiple structures in the folder, run the script like this:

[mm] py3dna$ git:(master) ✗ ./rna_x3dna.py test_data/*
test_data/1xjr.pdb
>1xjr nts=47 [1xjr] -- secondary structure derived by DSSR
gGAGUUCACCGAGGCCACGCGGAGUACGAUCGAGGGUACAGUGAAUU
..(((((((...((((.((((.....))..))..))).).)))))))
test_data/6TNA.pdb
>6TNA nts=76 [6TNA] -- secondary structure derived by DSSR
GCGGAUUUAgCUCAGuuGGGAGAGCgCCAGAcUgAAgAPcUGGAGgUCcUGUGtPCGaUCCACAGAAUUCGCACCA
(((((((..((((.....[..)))).((((.........)))).....(((((..]....))))))))))))....
test_data/rp2_bujnicki_1_rpr.pdb
>rp2_bujnicki_1_rpr nts=100 [rp2_bujnicki_1_rpr] -- secondary structure derived by DSSR
CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU&CCGGAGGAACUACUG&CCGGCAGCCU
[[[[(((.....(((&{{{{))))))&(((((((.....(.(&]]]]).))))&[[[[[[......[[[&))))]]].]]&}}}}(((.....(((&]]]]))))))
rna_tools.tools.rna_x3dna.rna_x3dna.get_parser()[source]
class rna_tools.tools.rna_x3dna.rna_x3dna.x3DNA(pdbfn, show_log=False)[source]

Atributes:

curr_fn report
clean_up(verbose=False)[source]
get_ion_water_report()[source]

@todo File name: /tmp/tmp0pdNHS

no. of DNA/RNA chains: 0 [] no. of nucleotides: 174 no. of waters: 793 no. of metals: 33 [Na=29, Mg=1, K=3]
get_modifications()[source]

Run find_pair to find modifications.

get_secstruc()[source]

Get secondary structure.

get_seq()[source]

Get sequence.

Somehow 1bzt_1 x3dna UCAGACUUUUAAPCUGA, what is P? P -> u

run_x3dna(show_log=False)[source]
exception rna_tools.tools.rna_x3dna.rna_x3dna.x3DNAMissingFile[source]

ClaRNA (contacts classification)

If you want to calculate “Interaction Network Fidelity (INF) and not only” see rna_calc_inf in the Utils.

usage:

./clarna_app.py ../../input/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
../../input/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
((((([[[[[[)))))........(.((....(]]]]]].)..(((......)))...)).)

Example

from rna_tools.utils.clarna_app import clarna_app
if __name__ == '__main__':
    ss = '((((.[[[[[[.))))........((((.....]]]]]]...(((((....)))))..))))'
    fnCRref = clarna_app.get_ClaRNA_output_from_dot_bracket(ss)
    f = '../rna_calc_rmsd/test_data/pistol/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb'
    fnCR = clarna_app.clarna_run(f, force=False)
    results = clarna_app.clarna_compare(fnCRref, fnCR)
    print results #
    #tmp_Z42i_..pdb.outCR     5k7c_clean_onechain_renumber_as_puzzle_srr.pdb.outCR      0.706      NA         0.865      NA         0.842      0.889      NA         0.000

Warning

Setup a bash variable: ClaRNA_play_path, and add ClaRNA_play to your $PATH (install ClaRNA_play https://gitlab.genesilico.pl/RNA/ClaRNA_play (internal GS gitlab server)

rna_tools.tools.clarna_app.clarna_app.clarna_compare(target_cl_fn, i_cl_fn, verbose=False)[source]

Run ClaRNA compare.

Returns:a list target, fn, scores

Scores:

inf_all      0.706
inf_stack -999.999 -> NA
inf_WC       0.865
inf_nWC   -999.999 -> NA
SNS_WC       0.842
PPV_WC       0.889
SNS_nWC     NA
PPV_nWC      0.000

Example of the list:

5k7c_clean_onechain_renumber_as_puzzle_srr.pdb     pistol_thrs0.50A_clust01-000001_AA.pdb
 0.642      NA         0.874      0.000      0.944      0.810      0.000      0.000s

use results.split()[4] to get inf_WC

rna_tools.tools.clarna_app.clarna_app.clarna_run(fn, force=True, stacking=True)[source]

Run ClaRNA run

fn: str
filename to analyze
Returns:a filename to ClaRNA output (fn + ‘.outCR’)
rna_tools.tools.clarna_app.clarna_app.get_ClaRNA_output_from_dot_bracket(ss, temp=True, verbose=False)[source]

Get dummy ClaRNA output out of dat bracket secondary structure (ss)

Parameters:ss (string) – secondary structure
Returns:a filename to ClaRNA output
rna_tools.tools.clarna_app.clarna_app.get_dot_bracket_from_ClaRNAoutput(inCR, verbose=False)[source]

In inCR file

rna_tools.tools.clarna_app.clarna_app.get_parser()[source]

RNA 3D structure prediction

ROSETTA

A set of wrappers around Rosetta (https://www.rosettacommons.org/), mostly based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

Run (modeling)

run_rosetta - wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http: // www.sciencedirect.com / science / article / pii / S0076687914000524

The script makes(1) a folder for you job, with seq.fa, ss.fa, input file is copied as input.fa to the folder(2) make helices(3) prepare rosetta input files(4) sends jobs to the cluster.

The header is take from the fast file(`` > /header / ``) not from the filename of your Fasta file.

I discovered this:

qstat -xml | tr '\n' ' ' | sed 's#<job_list[^>]*>#\n#g' \

> | sed ‘s#<[^>]*>##g’ | grep ” ” | column -t

(https://stackoverflow.com/questions/26104116/qstat-and-long-job-names) so there is now need to shorted my job ids.

Run:

rna_rosetta_run.py -i -e -r -g -c 200 cp20.fa

-i:

# prepare a folder for a run
>cp20
AUUAUCAAGAAUCUCAAAGAGAGAUAGCAACCUGCAAUAACGAGCAAGGUGCUAAAAUAGAUAAGCCAAAUUCAAUUGGAAAAAAUGUUAA
.(((((....(((((.....)))))(((..(((((..[[[[..)).))).)))......))))).((((......)))).......]]]].

[peyote2] ~ rna_rosetta_run.py -i cp20.fa
run rosetta for:
cp20
AUUAUCAAGAAUCUCAAAGAGAGAUAGCAACCUGCAAUAACGAGCAAGGUGCUAAAAUAGAUAAGCCAAAUUCAAUUGGAAAAAAUGUUAA
.(((((....(((((.....)))))(((..(((((..[[[[..)).))).)))......))))).((((......)))).......]]]].
/home / magnus // cp20 / created
Seq & ss created

Troubleshooting.

If one of the helices is missing you will get:

IOError: [Errno 2] No such file or directory: 'helix1.out'
rosetta_submit.py README_FARFAR o 500 100 taf
Could not find:  README_FARFAR

and the problem was a1 and g8 pairing:

outputting command line to:  helix0.RUN  # previous helix #0
Sequence:  AUGG CCGG
Secstruc:  (((())))
Not complementary at positions a1 and g8!
Sequence:  GUGGG CCCAU
Secstruc:  ((((()))))

Writing to fasta file:  helix2.fasta  # next helix #2

My case with a modeling of rp12

Sequence: cc gc Secstruc: (()) Not complementary at positions 1 and 4!

edit the secondary structure, run the program with -i(init, to overwrite seq.fa, ss.fa) and then it works.

Notes:

rp17hc 6 characters

usage: rna_rosetta_run.py [-h] [-i] [-e] [-r] [-g] [-m MOTIF] [-n NSTRUC]
                          [-c CPUS] [--sandbox SANDBOX]
                          file
Positional arguments:
file file: > a04 UAUAACAUAUAAUUUUGACAAUAUGGGUCAUAAGUUUCUACCGGAAUACCGUAAAUAUUCUGACUAUGUAUA ((((.((((…((.((((…….)))).))……..(.(((((…….))))).)..))))))))
Options:
-i=False, --init=False
 prepare _folder with seq and ss
-e=False, --helices=False
 produce h(E)lices
-r=False, --rosetta=False
 prepare rosetta files (still you need `go` to send jobs to a cluster)
-g=False, --go=False
 send jobs to a cluster(run qsubMINI)
-m, --motif include a motif file, e.g. -s E-loop_1q9a_mutated_no_flanks_renumber.pdb
-n=10000, --nstruc=10000
 # of structures you want to get
-c=200, --cpus=200
 # of cpus to be used
--sandbox=/home/magnus/rosetta-runs
 where to run it (default: RNA_ROSETTA_RUN_ROOT_DIR_MODELING

Get a number of structures

rna_roseta_n.py - show me # of structure in a silent file

Example:

$ rna_rosetta_n.py ade.out
21594

usage: rna_rosetta_n.py [-h] [-v] file
Positional arguments:
file ade.out
Options:
-v=False, --verbose=False
 Undocumented

Get a head of a Rosetta silent file

rna_roseta_head.py - get a head of a Rosetta silent file.

Example:

$ rna_rosetta_head.py -n 10000 thf.out
# a new file will be created, thf_10000.out

Silent file:

[peyote2] thf head -n 100 thf.out
SEQUENCE: ggagaguagaugauucgcguuaagugugugugaaugggaugucgucacacaacgaagcgagagcgcggugaaucauugcauccgcucca
SCORE:     score    rna_data_backbone    rna_vdw    rna_base_backbone    rna_backbone_backbone    rna_repulsive    rna_base_pair    rna_base_axis    rna_base_stagger    rna_base_stack    rna_base_stack_axis     rna_rg    atom_pair_constraint    linear_chainbreak       N_WC      N_NWC       N_BS description
REMARK BINARY SILENTFILE
SCORE:  -601.975                0.000     31.113              -16.960                   -3.888           20.501         -302.742          -38.531            -158.004           -80.764               -110.053     23.750                   0.000               33.601         32          6         86    S_000001_000
FOLD_TREE  EDGE 1 4 -1  JEDGE 4 85 1  C4   C2   END  EDGE 4 5 -1  EDGE 85 80 -1  EDGE 85 89 -1  JEDGE 80 40 5  C4   C2   END  EDGE 80 78 -1  EDGE 40 43 -1  EDGE 40 33 -1  JEDGE 33 45 4  C4   C2   END  EDGE 45 54 -1  EDGE 45 44 -1  EDGE 33 20 -1  JEDGE 20 65 3  C2   C4   END  EDGE 65 67 -1  EDGE 20 17 -1  EDGE 65 63 -1  JEDGE 17 69 2  C4   C2   END  JEDGE 63 58 6  C4   C2   END  EDGE 17 6 -1  EDGE 58 59 -1  EDGE 63 60 -1  EDGE 69 77 -1  EDGE 58 55 -1  EDGE 69 68 -1  S_000001_000
RT -0.987743 0.139354 0.0703103 0.135963 0.989404 -0.0509304 -0.0766626 -0.0407466 -0.996224 6.25631 0.103544 0.0647696   S_000001_000
RT -0.98312 0.1587 -0.091045 0.166923 0.981743 -0.0912024 0.074909 -0.10486 -0.991662 5.89962 -1.95819 -0.1075   S_000001_000
RT -0.987645 0.154078 0.0285994 0.153854 0.988044 -0.00989514 -0.0297821 -0.00537275 -0.999542 6.13138 1.047 0.115722   S_000001_000
RT -0.989884 0.140722 0.0180554 0.140036 0.989532 -0.0348618 -0.0227723 -0.0319807 -0.999229 6.21787 0.201588 -0.0264223   S_000001_000
RT -0.988455 0.134013 0.0706914 0.134924 0.990822 0.00825457 -0.0689364 0.0176973 -0.997464 6.19447 0.189237 0.125791   S_000001_000
RT -0.990412 0.138036 0.00546261 0.137927 0.990299 -0.0168644 -0.00773751 -0.0159492 -0.999843 6.25456 0.0842849 -0.0135807   S_000001_000
ANNOTATED_SEQUENCE: g[RGU:LowerRNA:Virtual_Phosphate]gaga[RAD:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]uagaugauucgcguuaagugugugugaaugggauguc[RCY:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]ucacacaacg[RGU:rna_cutpoint_lower]a[RAD:rna_cutpoint_upper]agcg[RGU:rna_cutpoint_lower]a[RAD:rna_cutpoint_upper]gagcgcg[RGU:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]ugaaucauu[URA:rna_cutpoint_lower]g[RGU:rna_cutpoint_upper]cauccgcucca[RAD:UpperRNA] S_000001_000
Le3nY9smsa4zEMGdvAA+z+e

It seems to work:

-rw-rw-r--   1 magnus users 474M 2017-08-06 05:25 thf_10000.out
-rw -rw-r--   1 magnus users 947M 2017-08-06 04:54 thf.out

[peyote2] thf rna_rosetta_n.py thf_10000.out
10000

usage: rna_rosetta_n.py [-h] [-v] [-n NSTRUC] file
Positional arguments:
file ade.out
Options:
-v=False, --verbose=False
 Undocumented
-n=10000, --nstruc=10000
 Undocumented

Cluster

A wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

rna_rosetta_cluster.py ade_min.out 20000

Take n * 0.005 (.5%) of all frames and put them into selected.out. Then the tool clusters this selected.out.

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.cluster(radius, limit_clusters)[source]

Internal function of cluster_loop: It removes cluster.out first.

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.cluster_loop(ns, radius_inc_step, limit_clusters)[source]

Go from radius 1 to get 1/6 of structures of ns (# of selected structures) in the first cluster, then it stops.

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.extract()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_cluster.get_no_structures(file)[source]
rna_tools.tools.rna_rosetta.rna_rosetta_cluster.get_no_structures_in_first_cluster(fn)[source]

Get # of structures in a silent file.

Parameters:fn (string) – a filename to a silent file
Returns
int: # of structures in a silent file
rna_tools.tools.rna_rosetta.rna_rosetta_cluster.get_parser()[source]
rna_tools.tools.rna_rosetta.rna_rosetta_cluster.get_selected(file, nc)[source]

Get selected for clustering

rna_tools.tools.rna_rosetta.rna_rosetta_cluster.run()[source]

Pipline for modeling RNA

Minimize

rna_rosetta_min.py - a script to do minimization

The script takes the number of structures and the analyzed silence file and does the maths.

Job names will be as your silent file preceding with ~, .e.g ~tha.

http://www.sciencedirect.com/science/article/pii/S0076687914000524

ade$ rna_rosetta_cluster.py ade.out

The first number states how many processors to use for the run, while the second number is 1/6 the total number of previously generated FARNA models. If you are running on a supercomputer that only allows specific multiples of processors, use an appropriate number for the first input.:

rosetta_submit.py min_cmdline min_out 1 24

rosetta_submit.py min_cmdline min_out [1] [16] The first number states how many processors to use for each line in min_cmdline. Here, enter 1 for the first input so that the total number of processors used will be equal to the number of processors entered with the “-proc” flag in command line [12], above. The second number states the maximum time each job will be allowed to run (walltime). Start the run with the appropriate command listed by the out- put above (e.g., source qsubMPI for the Stampede cluster).

E.g. for 20k silet file, 1/6 will be minimized = 3.3k:

parallel_min_setup.py -silent rp21cr62.out -tag rp21cr62_min  -proc 200 -nstruct 3200 -out_folder mo -out_script MINIMIZE " -ignore_zero_occupancy false "
rosetta_submit.py MINIMIZE mo 1 100 m

[peyote2] rp21 easy_cat.py mo
Catting into:  rp21_min.out ... from 200 primary files. Found 3200  decoys.

# on 200 cpus it took around ~30min

usage: rna_rosetta_min.py [-h] [-g] [-c CPUS] file
Positional arguments:
file ade.out
Options:
-g=False, --go=False
 Undocumented
-c=200, --cpus=200
 default: 200

Extract lowscore decoy

rna_rosetta_extract_lowscore_decoys.py - a simple wrapper to extract_lowscore_decoys.py

To be used in Jupter notebooks and other scripts.

usage: rna_rosetta_extract_lowscore_decoys.py [-h] [-v] nstruc file
Positional arguments:
nstruc # of low score structures to obtained
file silent file
Options:
-v=False, --verbose=False
 be verbose

Check progress

rna_rosetta_cluster.py - a script to cluster a silent file

Example:

[peyote2] rosetta_jobs rna_rosetta_check_progress.py .
         jobs  #curr  #todo  #decoys done
0  ./rp17s223    200      0      407  [ ]
1   ./rp17hcf      0      0        0  [ ]
# curr  232 #todo  0

usage: rna_rosetta_cluster.py [-h] [-v] [-m] [-s SELECT] [-k] dir
Positional arguments:
dir directory with rosetta runs, define by RNA_ROSETTA_RUN_ROOT_DIR_MODELING right now: /home/magnus/rosetta-runs
Options:
-v=False, --verbose=False
 be verbose
-m=False, --min-only=False
 check only for mo folder
-s=, --select= select for analysis only jobs with this phrase, .e.g., evoseq_
-k=False, --kill=False
 kill (qdel) jobs if your reach limit (nstruc) of structure that you want, right now is 10000 structures

SimRNA

Select low energy frames

rna_simrna_lowest.py

Select lowest energy frames out of a SimRNA trajectory file. This code uses heavily the SimRNATrajectory class. Be default 100 lowest energy frames is exported.

usage: rna_simrna_lowest.py [-h] [-n NSTRUC] trafl
Positional arguments:
trafl SimRNA trafl file
Options:
-n=100, --nstruc=100
 SimRNA trafl file

Extract

rna_simrna_extra.py - extract full atom structures from a SimRNA trajectory file.

Options:

SIMRNA_DATA_PATH has to be properly defined in rpt_config_local.

usage: rna_simrna_extract.py [-h] -t TEMPLATE -f TRAFL [-c]
                             [-n NUMBER_OF_STRUCTURES]
Options:
-t, --template template PDB file used for reconstruction to full atom models
-f, --trafl SimRNA trafl file
-c=False, --cleanup=False
 Keep only *_AA.pdb files, move *.ss_detected and *.pdbto _<traj name folder>
-n=100, --number_of_structures=100
 Undocumented

SimRNAweb

Download files of a SimRNAweb run

rna_simrnaweb_download_job.py - download model files, trajectory for a given SimRNAweb job.

Usage:

rp17pk$ rna_pdb_download_simrna_job.py 27b5093d -m -t -x
# download more clusters, trajectory, extract100

cp771_pk$ rna_pdb_download_simrna_job.py -t -x -m cf8f8bb2 -p cp771_pk
# download with a trajectory, and cluster #4 and #5, add to all pdb files
# prefix: cp771_pk

$ rna_simrnaweb_download_job.py --web-models rp17_well_d10_e1-a43d3ab5 --prefix tar
# prefix added will be tar_XXXX

Example:

rna_pdb_download_simrna_job.py -t -x -m 20569fa1 -p zmp_pk

[mm] zmp_pk ls
20569fa1_ALL_100low.trafl
_20569fa1-thrs7.10A_clust04
_20569fa1-thrs7.10A_clust05
_20569fa1_ALL_100low
data
rna_simrna_extract.log
subset.png
zmp_pk_20569fa1-thrs7.10A_clust01X.pdb
zmp_pk_20569fa1-thrs7.10A_clust02X.pdb
zmp_pk_20569fa1-thrs7.10A_clust03X.pdb
zmp_pk_20569fa1-thrs7.10A_clust04X.pdb
zmp_pk_20569fa1-thrs7.10A_clust05X.pdb

usage: rna_simrnaweb_download_job.py [-h] [-p PREFIX] [-n NSTRUC] [-e] [-m]
                                     [-r] [-c] [-d] [--top100] [--top200]
                                     [--web-models]
                                     job_id
Positional arguments:
job_id job_id
Options:
-p, --prefix prefix to the name, withouth _, be careful with this.If you have already some files with the given folder, their names mightbe changed.
-n=100, --nstruc=100
 extract nstruc the lowest energy, this option must go with –web
-e=False, --extract=False
 extract nstruc the lowest energy, this option must go with –web
-m=False, --more_clusters=False
 download also cluster 4 and 5
-r=False, --remove-trajectory=False
 remove trajectory after analysis
-c=False, --cluster=False
 get trajectory from cluster OR local on your computer (mdfind for macOS)
-d=False, --download-trajectory=False
 web
--top100=False download top100 trajectory
--top200=False download top200 trajectory
--web-models=False
 web models download

SimRNATrajectory

SimRNATrajectory module.

SimRNATrajectory / Frame / Residue / Atom

class rna_tools.tools.simrna_trajectory.simrna_trajectory.Atom(name, x, y, z)[source]

x y z coord

get_coord()[source]

Return coords (np.array).

class rna_tools.tools.simrna_trajectory.simrna_trajectory.Frame(id, header, coords, top_level=False)[source]
Syntax of header:
  • write_number
  • replica_id
  • total_energy
  • energy_without_restraints
  • temperature

Warning

If there is an invalid frame, please use repair_trafl.py to fix the trajectory first.

class rna_tools.tools.simrna_trajectory.simrna_trajectory.Residue(id, p, c4p, n1n9, b1, b2)[source]

Create Residue object.

Each residue in SimRNA coarse-grained represantation consists only 5 coarse-grained atoms:

  • backbone: p = phospate group, c4p = sugar moiety
  • nucleotide: n1n9 = N1 for pyrimidines, N9 for purines, b1 = C2 for purines and pyrimidines, b2 = C4 for pyrimidines, C6 for purines
get_atoms()[source]

Return all atoms

get_center()[source]

Return MB for residue `((self.n1n9 + self.b2) / 2)`

class rna_tools.tools.simrna_trajectory.simrna_trajectory.SimRNATrajectory[source]
load_from_file(fn, debug_break=False, top_level=False, only_first_frame=False)[source]

Create a trajectory based on give filename.

Parameters:top_level

top_level = True, don’t make a huge tree of objects (Residues/Atoms) == amazing speed up! Useful if you need only frames, energies and coords as text. You only get the info that is in header of each frame.

top_level = False, makes huge tree of objects (Residues/Atoms) == very slow for a huge trajectories

Warning

Loads up whole trafl file into memory, and get stuck. Use this if you want to compute e.g. distances between atoms, get the positions of specified atoms etc. If you can not process your trajectory

use top_level=True or look at load_from_string() to load a frame by frame from a file.

h(eader), l(line), f(ile)

load_from_list(frames)[source]
load_from_string(c, txt)[source]

Create a trajectory based on given string (txt) with id given by c.

Faster method, loads only one frame at a time to memory, and after computations loads the next frame (memory efficient).

plot_energy(plotfn='plot.png')[source]

Plots the SimRNA energy of the trajetory.

_images/simrnatrajectory.png
save(fn, verbose=True)[source]

Save the trajectory to file.

sort(inplace=True)[source]

Sort frames within the trajectory according to energy.

RNA Refinement (QRNAS)

rna_refinement - RNA refinement with QRNAS.

Models of RNA 3D structures obtained by modeling methods often suffer from local inaccuracies such as clashes or physically improbable bond lengths, backbone conformations, or sugar puckers. To ensure high quality of models, a procedure of re nement should be applied as a nal step in the modeling pipeline. The software tool QRNAS was developed in our laboratory to perform local re nement of nucleic acid structures based on an extended version of the AMBER force field. The extensions consist of energy terms associated with introduction of explicit hydrogen bonds, idealization of base pair planarity and regularization of backbone conformation.

Read more: Piatkowski, P., Kasprzak, J. M., Kumar, D., Magnus, M., Chojnowski, G., & Bujnicki, J. M. (2016). RNA 3D Structure Modeling by Combination of Template-Based Method ModeRNA, Template-Free Folding with SimRNA, and Refinement with QRNAS. Methods in Molecular Biology (Clifton, N.J.), 1490(Suppl), 217-235. http://doi.org/10.1007/978-1-4939-6433-8_14

Right now, there is 20k steps of refinement.

_images/qrnas_0k.png

The initial structure, 179c48aa-c0d3-4bd6-8e06-12081da22998_ALL_thrs6.20A_clust01-000001_AA.pdb.

_images/qrnas_3k.png

after 3k, ~10min

_images/qrnas_10k.png

after 10k steps, around 30min

_images/qrnas_20k.png

after 20k steps, around 1h.

Installation of QRNAS

Download the QRNAS package from http://genesilico.pl/qrnas/, unzip the archive, and compile it with the following command:

./qrnamake sequential

This should create an executable version of QRNAS.

Warning

Please, change the name of the binary file from QRNA to QRNAS!

Be default the script searches QRNAS in <rna-pdb-tools>/opt/qrnas/ .

Usage of QRNA:

QRNA - Quick Refinement of Nucleic Acids (0.2 alpha)
     by Juliusz Stasiewicz (jstasiewicz@genesilico.pl)

To use type:
  QRNA -i <input PDBfile> [-o <output PDBfile>] [-c <configfile>] [-p] [-m <restraintsfile>]
OR specify <input PDBfile>, <output PDBfile> and <restraintsfile> in <configfile> and type just:
  QRNA -c <configfile>

Installation of this util

Set up in your bashrc:

export QRNAS_PATH=<your path to qrnas> # e.g. /home/magnus/src/rna-pdb-tools/opt/qrnas

but default rna-pdb-tools searches for qrnas in <rna-pdb-tools>/opt/qrnas.

QRNAS at Peyote2

There is no problem to run QRNAS at our Genesilico cluster, peyote2. Tested by mmagnus –170822. Copy files of QRNAS to peyote and run ./qrnamake sequential.

To run it at a cluster with the Sun Grid Engine queuing system (this one with qusb ;-)):

for p in *.pdb; do echo "rna_refinement.py $p >& ${p}.log" | qsub -cwd -V -pe mpi 1 -N "r_$p" ; done

DONE:

  • [x] clean up the output structure
  • [x] configuration should not be hardcoded

usage: rna_refinement.py [-h] [-s STEPS] [-o OUTPUT_FILE] fn
Positional arguments:
fn input pdb file
Options:
-s=20000, --steps=20000
 # of steps, default: 20k
-o, --output_file
 output pdb file

diffpdb

diffpdb - a simple tool to compare text-content of PDB files

The method is quick-and-dirty, but works!

The script takes first 31 characters of lines (or only atom names and residue names) starting with HETATM or ATOM and save these lines to a <filename>.out file.

One file is created per pdb. In the final step DIFF_TOOL is executed on these two output files. You get a diff output. That’s it! Enjoy!

Configuration:

  • DIFF_TOOL="open -a diffmerge" or DIFF_TOOL="kompare" to set up what tool would you like to use to diff files in the file rna-pdb-tools/tools/diffpdb/diffpdb_conf.py (create it if needed)
_images/screenshot.png

./diffpdb.py --names test_data/4/1duq.pdb test_data/4/1duq_decoy0171_amb_clx.pdb

_images/screenshot2.png

and on the Mac (using diffmerge):

_images/diffpdb_osx_diffmerge.png

One of the difference that can be detected with the script is variants of atoms.

_images/atom-variants.png

or a detection of missing atom.

_images/missing-atoms.png

or a detection of missing OP3 at the beginning.

_images/missing-op3.png

RNA clustering with CLANS (clanstix)

rna_clanstix - a tool for visualizing RNA 3D structures based on pairwise structural similarity with Clans.

We hacked Clans thus instead of BLAST-based distances between sequences, you can analyze distances between structures described as p-values of rmsd (based on the method from the Dokholyan lab.)

Quickref:

rna_clanstix.py --groups-auto 10 --color-by-homolog --shape-by-source  thf_ref_mapping_pk_refX.txt input2.clans

Running Clans: To run CLANS you need to have Java 1.4 or better installed (java can be downloaded HERE). For full functionality you will also need the NCBI BLAST,PSI-BLAST and formatdb executables (NCBI). For command line parameters and basic help please refer to the README file. (source: http://www.eb.tuebingen.mpg.de/research/departments/protein-evolution/software/clans.html)

_images/yndSrLTb7l.gif

The RMSDs between structures are converted into p-values based on the method from the Dokholyan lab or some hacky way developed by mmagnus .

Color groups

You can color your groups:

_images/rna_clanstix.png

To get colors, run a cmd like this:

rna_clastix.py rnapz17_matrix_farfar_HelSeedCst.txt --groups 20:seq1+20+20+20+20+20+20:seq10

where with the + sign you separate groups. Each group has to have a number of structures. Optionally it can have a name, e.g., 20:seq1, use : as a separator. If a provided name is native then this group will be shown as starts.

Get inspiration for more colors (http://www.rapidtables.com/web/color/RGB_Color.htm)

How to use ClanstixRNA?

  1. Get a matrix of distances, save it as e.g. matrix.txt (see Comment below)

  2. run ClanstixRNA on this matrix to get an input file to Clans (e.g. clans_rna.txt):

    rna_clanstix.py test_data/matrix.txt # clans.input will be created by default
    
  3. open CLANS and click File -> Load run and load clans_run.txt

  4. You’re done! :-)

Comment: To get this matrix you can use for example another tool from the rna-pdb-tools packages:

rna_calc_rmsd_all_vs_all.py -i rp18 -o rp18_rmsd.csv
rna_clastix.py --groups 1:native+5:3dRNA+
      5:Chen+3:Dokh+5:Feng+5:LeeASModel+
      5:Lee+5:RNAComposer+10:RW3D+5:Rhiju+
      1:YagoubAli+3:SimRNA  rp18_rmsd.csv clans.in

rna_clastix.py --groups 100+100+100+100+100+100+100+100+100+100+1:native  rp18_rmsd.csv

where rp18 is a folder with structure and rp18_rmsd.csv is a matrix of all-vs-all rmsds.

_images/rp18_clanstix.png

Hajdin, C. E., Ding, F., Dokholyan, N. V, & Weeks, K. M. (2010). On the significance of an RNA tertiary structure prediction. RNA (New York, N.Y.), 16(7), 1340–9. doi:10.1261/rna.1837410

An output of this tool can be viewed using CLANS.

Frickey, T., & Lupas, A. (2004). CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics (Oxford, England), 20(18), 3702–4. doi:10.1093/bioinformatics/bth444

class rna_tools.tools.clanstix.rna_clanstix.RNAStructClans(n=10, dotsize=10)[source]

Clans run.

Usage:

>>> f = open('matrix.txt')
>>> ids = f.readline().replace('#','').split()
>>> c = RNAStructClans(n=len(ids)) # 200?
>>> c.add_ids(ids)
>>> c.dist_from_matrix(f)
>>> print(c.txt)
add_ids(ids)[source]
dist_from_matrix(rmsds, matrix=0, use_pv=False, dont_calc=False, debug=False)[source]
dist_from_matrix_mp(output_pmatrix_fn, max, min, lines, pmat=False, use_pv=False, debug=False)[source]
rna_tools.tools.clanstix.rna_clanstix.check_symmetric(a, rtol=1e-05, atol=1e-08)[source]

https://stackoverflow.com/questions/42908334/checking-if-a-matrix-is-symmetric-in-numpy

rna_tools.tools.clanstix.rna_clanstix.get_parser()[source]

Misc

Plotting

rna_plot_hist.py - generate a histogram

Don’t open Excel, Jupyter. Simple plot a histogram of one column and save it to a file.

Example:

# file
                                                  fn  rmsd_all
0  19_Bujnicki_Human_4_rpr_n0-000001.pdb-000001_A...     14.73
1  19_Bujnicki_Human_4_rpr_n0-000001.pdb.trafl_19...      0.46
2  19_Bujnicki_Human_4_rpr_n0-000001.pdb.trafl_19...     14.73
3  19_Bujnicki_Human_4_rpr_n0-000001.pdb_thrs0.50...      0.73
4  19_Bujnicki_Human_4_rpr_n0-000001.pdb_thrs0.50...      0.83

$ rna_plot_hist.py rmsds.csv --column rmsd_all
_images/rmsds_hist.png

usage: rna_plot_hist [-h] [--column COLUMN] [--sep SEP] [-o OUTPUT]
                     [--bins BINS]
                     file
Positional arguments:
file rmsd.txt
Options:
--column column of file to plot
--sep=, separator, be default
-o, --output Undocumented
--bins=10 Undocumented

rna_plot_density.py - generate a density plot

Don’t open Excel, Jupyter. Simple plot a density of one column and save it to a file.

Example:

# file
                                                  fn  rmsd_all
0  19_Bujnicki_Human_4_rpr_n0-000001.pdb-000001_A...     14.73
1  19_Bujnicki_Human_4_rpr_n0-000001.pdb.trafl_19...      0.46
2  19_Bujnicki_Human_4_rpr_n0-000001.pdb.trafl_19...     14.73
3  19_Bujnicki_Human_4_rpr_n0-000001.pdb_thrs0.50...      0.73
4  19_Bujnicki_Human_4_rpr_n0-000001.pdb_thrs0.50...      0.83

$ rna_plot_hist.py rmsds.csv --column rmsd_all
_images/rmsds_dens.png

usage: rna_plot_density [-h] [--column COLUMN] [--sep SEP] [-o OUTPUT] file
Positional arguments:
file rmsd.txt
Options:
--column column of file to plot
--sep=, separator, be default
-o, --output Undocumented

rna_sali2dotbracket

usage: rna_sali2dotbracket [-h] filename
Positional arguments:
filename file in the Sali format

This beauty here will go to sali notation and convert it to dotbracket notation. The file name should be xxxx.sali

Author: Catarina Almeida

rna_tools.tools.rna_sali2dotbracket.rna_sali2dotbracket.convert_sali2dotbracket(fn)[source]

The function needs a filename in the Sali format. This function will get the secondary structure of the sequence, then get its identifier and then the sequence itself.

To get the ss

The line with the secondary structure is a list and will look like this:

['', '', '', '', '', '', '', '', '', '', '--...<<<[[...]]..>>>>', '', '', '\n']

In this case, the ss is in the 11th position. But in some files it may be in the 12th, 13th, 10th, etc..

If the longest element from the list is extracted, then this problem is overcomed.

The ss will some times have patterns of repeated gaps, which will come in the form of:

  1. x
  2. xnt
  3. ( x )

With x being any number, from 1 to 1000. These must be converted to the correspondent number of gaps (-) in the converted ss. This conversion is done by:

1 - Identifying the pattern with regex

2 - Replacing it with repl function.

As such, the following expressions will replace the previously mentioned patterns:

  1. re.sub(r'\d*\d', repl, temp)
  2. re.sub(r'\d*\dnt', repl, temp)
  3. re.sub(r'(?P<smthBeautiful>\(\d+\))', repl, temp)

To get the sequence

The sequence, much like the ss, can sometimes be in a different position in the list. Like in the ss, the longest element will be selected. Also, like in the ss, patterns for repeated gaps appear. So these must also be removed.

rna_tools.tools.rna_sali2dotbracket.rna_sali2dotbracket.get_parser()[source]
rna_tools.tools.rna_sali2dotbracket.rna_sali2dotbracket.repl(m)[source]

This function will replace the length of a given string by the correspondent number of dashes. The expression qwerty will be replaced by -----.

Cluster load

A very simple tool to see your cluster load per user:

MAX_JOBS: 1000
#jobs cluster 917 load:  0.917  to use: 83
#jobs you     749 load:  0.749  to use: 251
{'deepak': 160, 'azyla': 8, 'magnus': 749}
1 azyla        r 8
20 magnus       r 10
16 deepak       r 10
329 magnus       r 1
22 magnus       qw 10

A super simple script to get some statistics of who is running at a cluster

Set MAX_JOBS to calc % of usage, it’s an approximation of max number of jobs, e.g. peyote ~1k (rather 700, e.g. FARNA runs.).

rna_tools.tools.cluster_load.cluster_load.get_parser()[source]
rna_tools.tools.cluster_load.cluster_load.per_user()[source]

get stats (#cpus) per user

rna_tools.tools.cluster_load.cluster_load.stats_for_cluster()[source]

get stats (#jobs) per cluster

rna_tools.tools.cluster_load.cluster_load.stats_for_user()[source]

get stats (#jobs) per user

RNAkb

RNAkb (previous Gromacs) utils.

A module with different functions needed for Gromacs/RNAkb merriage.

Marcin Magnus Albert Bogdanowicz

  1. prepare groups and then (2) mdp score file.
rna_tools.tools.rnakb_utils.rnakb_utils.fix_gromacs_gro(path, verbose=False)[source]

It’s probably a bug in GROMACS, but box coordinates in gro files are not always separated by spaces. This function guesses how it should be separated and inserts spaces.

Parameters:path = path to gro file (*) –
Output:
  • file is overwritten with a corrected one
rna_tools.tools.rnakb_utils.rnakb_utils.fix_gromacs_ndx(path)[source]

Sometimes, GROMACS index has some atoms in more than one group, or doesn’t have all the groups grompp requires. This function fixes that.

Parameters:path = path to index file (*) –
Output:
  • index is overwritten with a corrected one
rna_tools.tools.rnakb_utils.rnakb_utils.format_score_mdp(mdp_out, energygrps, seq, verbose=False)[source]

Get a template score mdp and replace energygrps (it can be generated with prepare_groups) and energygrp_table

rna_tools.tools.rnakb_utils.rnakb_utils.get_res_code(line)[source]

Get residue code from a line of a PDB file

rna_tools.tools.rnakb_utils.rnakb_utils.get_res_num(line)[source]

Extract residue number from a line of PDB file

Parameters:line = ATOM line from a PDB file (*) –
Output:
  • residue number as an integer
rna_tools.tools.rnakb_utils.rnakb_utils.make_rna_gromacs_ready(pdb_string, verbose=False)[source]

GROMACS has some special requirements for PDB files.

Parameters:pdb_string = contents of PDB file as a string (*) –
Output:
  • new PDB returned as a string

(!!!) # hmm… [ RA5 ] will not be detected based on it (!?) Hmm.. becase it detects if the structure is already prepared.

rna_tools.tools.rnakb_utils.rnakb_utils.make_rna_rnakb_ready(pdb_string, verbose=False)[source]

RNAkb read (difference between this function and make_rna_gromacs_ready is ignoring R5U etc. RNAkb does not treat them differently so there is no point to distinguish them.

Parameters:pdb_string = contents of PDB file as a string (*) –
Output:
  • new PDB returned as a string
rna_tools.tools.rnakb_utils.rnakb_utils.prepare_groups(fn, gr_fn, potential='aa', verbose=False)[source]

Prepare an index for fn file. gr_fn is a file where gtxt is saved in.

Get seq and uniq & sort it. ['RG5', 'RA', 'RA', 'RA', 'RG', 'RU', 'RA', 'RA', 'RC3'] set(['RU', 'RG', 'RC3', 'RG5', 'RA'])

@todo RG5 etc – done!

gtxt:

del 1
r RU & a C1'
name 1 uC1s
r RU & a C2
name 2 uC2
r RU & a C2'
name 3 uC2s
...

return, gtxt (groups_txt), energygrps . The result is saved to g_fn. energygrps: [‘uP’, ‘uC4s’, ‘uC2’, ‘uC4’, ‘uC6’, ‘gP’, ‘gC4s’, ‘gC2’, ‘gC4’, ‘gC6’, ‘aP’, ‘aC4s’, ‘aC2’, ‘aC4’, ‘aC6’] gtxt: RA del 1 r RU & a P name 1 uP r RU & a C4s name 2 uC4s r RU & a C2 name 3 uC2 r RU & a C4 [..] r RA & a C6 name 15 aC6 1|2|3|4|5|6|7|8|9|10|11|12|13|14|15 name 16 RNA_5pt 0 & ! 16 name 17 other q

rna_tools.tools.rnakb_utils.rnakb_utils.set_res_code(line, code)[source]

Set residue code from a line of a PDB file