1 This is a description of the pipeline designed to do scaffolding
2 of fragmented genomes using RNA-seq. The code should run
3 on any Unix-like system supporting python 2.6 or better. The code
4 is developed on MacOS X on python 2.6.
6 Note that RNAPATH is not currently optimized for running on machines with
7 small or medium amounts of RAM. 32 Gb minimum is recommended for the current
10 1. COMMAND LINE OPTIONS
11 2. MAPPING THE READS AND BUILDING THE RDS FILES
12 3. GETTING THE SCAFFOLDING READS
16 1. COMMAND LINE OPTIONS
18 To find out more about the settings for each script, type:
20 python $ERANGEPATH/<scriptname>
22 to see the command line options. Note that all ERANGE command-line
23 options are case-sensitive & that the scripts typically ignore
24 command-line arguments that they do not recognize!
27 2. MAPPING THE READS AND BUILDING THE RDS FILES
29 Before running the RNAPATH script on a genome (assumed to be in fasta format),
30 you will need to first map the RNA-seq reads using BLAT and import those reads
31 into an RDS file, as described in README.build-rds .
33 3. GETTING THE SCAFFOLDING READS
35 Once you have an indexed RDS file, use the scriipit distalPairs.py to output
36 the list of paired reads that do not map to the same contig. This involves
37 specifying a distance to distalPairs.py that is greater than the length of the
38 largest existing genomic contig. For example:
40 python ../commoncode/distalPairs.py 20000 rna_on_genomic.rds rna_on_genomic.crosspairs --splices --cache 20000000
44 You can now run RNAPATH.py. I suggest optionally using the included script processvelvet.py to rename the contigs, before running blat and generating the crosspair data.
46 Example: $ERANGEPATH/rnapath/RNAPATH.py genomic_contigs.fa rna_on_genomic.crosspairs RNAPATH.log genome.RNAPATH.fa
48 version 3.3 November 2010 - updated command line options
49 version 3.2 May 2010 - first release