Utils

Class in the package (@todo).

RNA Sequence

@todo should be renamed to RNASeq, and merged with RNASeq class from RNAalignment.

Seq and secondary structure prediction.

Installation:

Q: does it work for more than one chain??? Hmm.. I think it’s not.

class rna_pdb_tools.Seq.Seq(seq)[source]

Seq.

Usage:

>>> seq = Seq("CCCCUUUUGGGG")
>>> seq.name = 'RNA03'
>>> print(seq.predict_ss("RNAfold", constraints="((((....))))"))
>RNA03
CCCCUUUUGGGG
((((....)))) ( -6.40)
predict_ss(method='RNAfold', constraints='', verbose=False)[source]

Predict secondary structure of the seq.

It creates a seq fasta file and runs various methods for secondary structure prediction. You can provide also a constraints file for RNAfold and RNAsubopt.

ContextFold:

$ java -cp bin contextFold.app.Predict in:CCCCUUUGGGGG
CCCCUUUGGGGG
((((....))))

It seems that a seq has to be longer than 9. Otherwise:

$ java -cp bin contextFold.app.Predict in:UUUUUUGGG
Exception in thread "main" java.lang.ArrayIndexOutOfBoundsException: 10

# this is OK
$ java -cp bin contextFold.app.Predict in:CCCCUUUGGG
CCCCUUUGGG
.(((...)))

BlastPDB

A super-simple wrapper around Blast on the PDB db (online).

RfamSearch

A super-simple wrapper around cmscan (Infernal) on local RFAM.

RNA Alignment

RNASeq

RNAalignment

CMAlign

RChie

RNA clustering with CLANS (clanstix)

A tool for visualizing RNA 3D structures based on pairwise structural similarity.

_images/yndSrLTb7l.gif

The RMSDs between structures are converted into p-values based on the method from the Dokholyan lab.

How to use ClanstixRNA?

  1. Get a matrix of distances, save it as e.g. matrix.txt

  2. run ClanstixRNA on this matrix to get an input file to Clans (e.g. clans_rna.txt):

    clanstix.py test_data/matrix.txt > clans_run.txt
    
  3. open CLANS and click File -> Load run and load clans_run.txt

  4. You’re done! :-)

Hajdin, C. E., Ding, F., Dokholyan, N. V, & Weeks, K. M. (2010). On the significance of an RNA tertiary structure prediction. RNA (New York, N.Y.), 16(7), 1340–9. doi:10.1261/rna.1837410

An output of this tool can be viewed using CLANS.

Frickey, T., & Lupas, A. (2004). CLANS: a Java application for visualizing protein families based on pairwise similarity. Bioinformatics (Oxford, England), 20(18), 3702–4. doi:10.1093/bioinformatics/bth444

RNA Helix Vis (draw helices using PyMOL)

Calculate Root Mean Square Deviation (RMSD)

rna_calc_rmsd

rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd(a, b, target_selection, target_ignore_selection, model_selection, model_ignore_selection, verbose)[source]

a is model b is target

Params:a = filename of structure a
Params:b = filename of structure b
Returns:rmsd, number of atoms
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.calc_rmsd_pymol(pdb1, pdb2, method)[source]

Calculate rmsd using PyMOL. Two methods are available: align and fit

See:

Align can return a list with 7 items:

RMSD after refinement Number of aligned atoms after refinement Number of refinement cycles RMSD before refinement Number of aligned atoms before refinement Raw alignment score Number of residues aligned

in this version of function, the function returns RMSD before refinement.

Install on OSX: brew install homebrew/science/pymol and set PYTHONPATH to your PyMOL packages, .e.g

PYTHONPATH=$PYTHONPATH:/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages

If problem:

Match-Error: unable to open matrix file '/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/data/pymol/matrices/BLOSUM62'.

then define PYMOL_PATH in your .bashrc, e.g.:

export PYMOL_PATH=/opt/local/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/pymol/
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.get_rna_models_from_dir(files)[source]
Parameters:models – a list of filenames

Example of the list:

['test_data/rp17/2_restr1_Michal1.pdb_clean.pdb', 'test_data/rp17/2a_nonrestr2_Michal1.pdb_clean.pdb', 
'test_data/rp17/3_nonrestr1_Michal1.pdb_clean.pdb', 'test_data/rp17/5_restr1_Michal3.pdb_clean.pdb']
rna_pdb_tools.utils.rna_calc_rmsd.rna_calc_rmsd.sort_nicely(l)[source]

Sort the given list in the way that humans expect.

http://blog.codinghorror.com/sorting-for-humans-natural-sort-order/

rna_calc_rmsd_all_vs_all

Calculate Interaction Network Fidelity (INF) and not only

rna_calc_inf

usage: rna_calc_inf [-h] [-t TARGET_FN] [-m NT] [-s SS] [-f] [-v] [-o OUT_FN]
                    files [files ...]

Positional Arguments

files files, .e.g folder_with_pdbs/*pdbs

Named Arguments

-t, --target_fn
 

pdb file

Default: “”

-m, --number_of_threads
 

number of threads used for multiprocessing, if 1 then mp is not used (useful for debugging)!

Default: 8

-s, --ss

A:(([[))]]

Default: “”

-f, --force

force to run ClaRNA even if <pdb>.outCR file is there

Default: False

-v, --verbose

be verbose, tell me more what’re doing

Default: False

-o, --out_fn

out csv file, be default inf.csv

Default: “inf.csv”

A tool to calc inf_all, inf_stack, inf_WC, inf_nWC, SNS_WC, PPV_WC, SNS_nWC, PPV_nWC between two structures.

ClaRNA_play required! https://gitlab.genesilico.pl/RNA/ClaRNA_play (internal GS gitlab server). Contact <magnus@genesilico.pl>.

import progressbar (in version 2) is required!

rna_pdb_tools.utils.rna_calc_inf.rna_calc_inf.do_job(i)[source]

Run ClaRNA & Compare, add 1 to the counter, write output to csv file (keeping it locked)

diffpdb

_images/diffpdb_osx_diffmerge.png

rna_filter

Secondary structure format conversion

rna_convert_pseudoknot_formats

Run this as:

python rna-pk-simrna-to-one-line.py test_data/simrna.ss

Convert:

> a
....((.(..(((....)))..((((.(.........).)))....).)).......((((......))))
..............................((((...................))))..............

to:

> a
....((.(..(((....)))..((((.(..[[[[...).)))....).))...]]]]((((......))))

and:

>2 chains
(((((......)))))........(.((....(.......)..(((. .)))...)).)
.....((((((......................))))))........ ...........

to:

>2 chains
((((([[[[[[)))))........(.((....(]]]]]].)..(((. .)))...)).)

and:

> b
..(.......(((....)))..((((.(.........).))))).............((((......))))
....((.(......................................).)).....................
..............................((((...................))))..............

to:

> b
..(.[[.[..(((....)))..((((.(..{{{{...).)))))..].]]...}}}}((((......))))

and it works with VARNA:

_images/varna_2pk.png

Convert a secondary structure with a pk to the SimRNA format:

rna_convert_pseudoknot_formats git:(master) ✗ python rna_ss_pk_to_simrna.py test_data/ss_with_pk.ss
((((([[[[[[)))))........(.((....(]]]]]].)..(((. .)))...)).)

(((((......)))))........(.((....(.......)..(((. .)))...)).)
.....((((((......................))))))........ ...........

Secondary structure (secstruc)

ClaRNA (contacts classification)

If you want to calculate “Interaction Network Fidelity (INF) and not only” see rna_calc_inf in the Utils.

usage:

./clarna_app.py ../../input/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
../../input/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb
((((([[[[[[)))))........(.((....(]]]]]].)..(((......)))...)).)

Example:

from rna_pdb_tools.utils.clarna_app import clarna_app
if __name__ == '__main__':
    ss = '((((.[[[[[[.))))........((((.....]]]]]]...(((((....)))))..))))'
    fnCRref = clarna_app.get_ClaRNA_output_from_dot_bracket(ss)
    f = '../rna_calc_rmsd/test_data/pistol/5k7c_clean_onechain_renumber_as_puzzle_srr.pdb'
    fnCR = clarna_app.clarna_run(f, force=False)
    results = clarna_app.clarna_compare(fnCRref, fnCR)
    print results #
    #tmp_Z42i_..pdb.outCR     5k7c_clean_onechain_renumber_as_puzzle_srr.pdb.outCR      0.706      NA         0.865      NA         0.842      0.889      NA         0.000

Warning

Setup a bash variable: ClaRNA_play_path, and add ClaRNA_play to your $PATH (install ClaRNA_play https://gitlab.genesilico.pl/RNA/ClaRNA_play (internal GS gitlab server)

rna_pdb_tools.utils.clarna_app.clarna_app.clarna_compare(target_cl_fn, i_cl_fn, verbose=False)[source]

Run ClaRNA compare.

Returns:a list target, fn, scores

Scores:

inf_all      0.706
inf_stack -999.999 -> NA
inf_WC       0.865 
inf_nWC   -999.999 -> NA
SNS_WC       0.842
PPV_WC       0.889
SNS_nWC     NA
PPV_nWC      0.000

Example of the list:

5k7c_clean_onechain_renumber_as_puzzle_srr.pdb     pistol_thrs0.50A_clust01-000001_AA.pdb
 0.642      NA         0.874      0.000      0.944      0.810      0.000      0.000s

use results.split()[4] to get inf_WC

rna_pdb_tools.utils.clarna_app.clarna_app.clarna_run(fn, force=True)[source]

Run ClaRNA run

fn: str
filename to analyze
Returns:a filename to ClaRNA output (fn + ‘.outCR’)
rna_pdb_tools.utils.clarna_app.clarna_app.get_ClaRNA_output_from_dot_bracket(ss, temp=True, verbose=False)[source]

Get dummy ClaRNA output out of dat bracket secondary structure (ss)

Returns:a filename to ClaRNA output
rna_pdb_tools.utils.clarna_app.clarna_app.get_dot_bracket_from_ClaRNAoutput(inCR, verbose=False)[source]

In inCR file

SimRNA

Download files of a SimRNAweb run

Select low energy frames

Extract

SimRNATrajectory

ROSETTA

Run (modeling)

usage: rna_rosetta_run.py [-h] [-i] [-e] [-r] [-g] [-c CPUS] file

Positional Arguments

file file: >a04 UAUAACAUAUAAUUUUGACAAUAUGGGUCAUAAGUUUCUACCGGAAUACCGUAAAUAUUCUGACUAUGUAUA ((((.((((…((.((((…….)))).))……..(.(((((…….))))).)..))))))))

Named Arguments

-i, --init

prepare _folder with seq and ss

Default: False

-e, --helices

produce h(E)lices

Default: False

-r, --rosetta

prepare rosetta files (still you need go to send jobs to a cluster)

Default: False

-g, --go

send jobs to a cluster (run qsubMINI)

Default: False

-c, --cpus

default: 200

Default: 200

run_rosetta - wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

The script makes (1) a folder for you job, with seq.fa, ss.fa, input file is copied as input.fa to the folder (2) make helices (3) prepare rosetta input files (4) sends jobs to the cluster.

Mind that headers will be trimmed to 5 characters (header[:5]). The header is take from the fast file (>/header/) not from the filename of your Fasta file.

If one of the helices is missing you will get:

IOError: [Errno 2] No such file or directory: 'helix1.out'
rosetta_submit.py README_FARFAR o 500 100 taf
Could not find:  README_FARFAR

and the problem was a1 and g8 pairing:

outputting command line to:  helix0.RUN # previous helix #0
Sequence:  AUGG CCGG
Secstruc:  (((( ))))
Not complementary at positions a1 and g8!
Sequence:  GUGGG CCCAU
Secstruc:  ((((( )))))

Writing to fasta file:  helix2.fasta # next helix #2

My case with a modeling of rp12

Sequence: cc gc Secstruc: (( )) Not complementary at positions 1 and 4!

edit the secondary structure, run the program with -i (init, to overwrite seq.fa, ss.fa) and then it works.

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_run.go()[source]

send jobs to a cluster (run qsubMINI)

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_run.main()[source]

Pipline for modeling RNA

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_run.prepare_folder(args, header, seq, ss, path)[source]

Make folder for you job, with seq.fa, ss.fa, input file is copied as input.fa to the folder

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_run.prepare_helices()[source]

Make helices (wrapper around ‘helix_preassemble_setup.py’)

Warning

I think multiprocessing of helixX.run does not work.

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_run.prepare_rosetta(header, cpus)[source]

Repare ROSETTA using rna_denovo_setup.py

cpus is used to calc nstruc per job to get 20k structures per full run:

nstruct = int(math.floor(20000/cpus))
e.g. 
40 (nstruc) = 20k / 500 (cpus) 

Get a number of structures

usage: rna_rosetta_n.py [-h] file

Positional Arguments

file ade.out

Usage:

$ rna_rosetta_n.py ade.out
21594
rna_pdb_tools.utils.rna_rosetta.rna_rosetta_n.run()[source]

Pipline for modeling RNA

Minimize

usage: rna_rosetta_min.py [-h] [-g] [--cpus CPUS] file

Positional Arguments

file ade.out

Named Arguments

-g, --go Default: False
--cpus

default: 200

Default: 200

run_rosetta - wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

ade$ rna_rosetta_cluster.py ade.out

The first number states how many processors to use for the run, while the second number is 1/6 the total number of previously generated FARNA models. If you are running on a supercomputer that only allows specific multiples of processors, use an appropriate number for the first input.

rosetta_submit.py min_cmdline min_out 1 24

rosetta_submit.py min_cmdline min_out [1] [16] The first number states how many processors to use for each line in min_cmdline. Here, enter 1 for the first input so that the total number of processors used will be equal to the number of processors entered with the “-proc” flag in command line [12], above. The second number states the maximum time each job will be allowed to run (walltime). Start the run with the appropriate command listed by the out- put above (e.g., source qsubMPI for the Stampede cluster).

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_min.min(f, take_n, cpus, go)[source]

Run parallel_min_setup (to MINIMIZE file), rosetta_submit.py, and qsubMINI.

Fix on the way, qsub files:

-out:file:silent mo/0/mo/123/tha_min.out -> -out:file:silent mo/123/tha_min.out

I don’t know why mo/0/ is there. I might be because of my changes in rosetta_submit.py (?).

Cluster

usage: rna_rosetta_cluster.py [-h] file n

Positional Arguments

file ade.out
n # of total structures

run_rosetta - wrapper to ROSETTA tools for RNA modeling

Based on C. Y. Cheng, F. C. Chou, and R. Das, Modeling complex RNA tertiary folds with Rosetta, 1st ed., vol. 553. Elsevier Inc., 2015. http://www.sciencedirect.com/science/article/pii/S0076687914000524

rna_rosetta_cluster.py ade_min.out 20000
rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.cluster(radius=1)[source]

Internal function of cluster_loop: It removes cluster.out first.

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.cluster_loop(ns)[source]

Go from radius 1 to get 1/6 of structures of ns (# of selected structures) in the first cluster, then it stops.

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.get_selected(file, nc)[source]

Get selected for clustering

rna_pdb_tools.utils.rna_rosetta.rna_rosetta_cluster.run()[source]

Pipline for modeling RNA

PyMOL

Install PyMOL plugin to view the interactions with PyMOL:

run <path>rna-pdb-tools/utils/pymol_drawing/pymol_dists.py

and type:

draw_dists([[29, 41], [7, 66], [28, 42], [51, 63], [50, 64], [2, 71], [5, 68], [3, 70], [31, 39], [4, 69], [6, 67], [12, 23], [52, 62], [30, 40], [49, 65], [27, 43], [11, 24], [1, 72], [10, 25], [15, 48], [53, 61], [19, 56], [13, 22], [36, 37], [18, 19], [22, 46], [35, 73], [32, 38], [9, 13], [19, 20], [18, 20], [54, 60], [9, 23], [34, 35], [36, 38], [53, 54], [20, 56], [9, 12], [26, 44], [18, 55], [54, 61], [32, 36]])
_images/pymol_dists.png

Misc

rna_sali2dotbracket

rna_add_chain

Cluster load

A very simple tool to see your cluster load per user:

MAX_JOBS: 1000
#jobs cluster 917 load:  0.917  to use: 83
#jobs you     749 load:  0.749  to use: 251
{'deepak': 160, 'azyla': 8, 'magnus': 749}
1 azyla        r 8
20 magnus       r 10
16 deepak       r 10
329 magnus       r 1
22 magnus       qw 10

A super simple script to get some statistics of who is running at a cluster

Set MAX_JOBS to calc % of usage, it’s an approximation of max number of jobs, e.g. peyote ~1k

rna_pdb_tools.utils.cluster_load.cluster_load.per_user()[source]

get stats (#cpus) per user

rna_pdb_tools.utils.cluster_load.cluster_load.stats_for_cluster()[source]

get stats (#jobs) per cluster

rna_pdb_tools.utils.cluster_load.cluster_load.stats_for_user()[source]

get stats (#jobs) per user