8 Last updated: Oct 16th, 2006
10 Updated to Mussagl build: 287? (In process to 424)
14 * New features / change log
15 * Comment out anything isn't implemented yet.
16 * (DONE) List of features that will be implemented in the future.
17 * Look into the homology mapping of UCSC.
18 * Add toggle to genomes.
19 * Document why one fast record per region.
20 * How to deal with the hazards of small utrs vis motif finder. (Add warning)
21 * Add warning about saving FASTA file.
22 * Add a general principles section near the top
23 * Using comparison algorithm which will pickup all repeats
24 * Add info about repeatmasking
25 * Checking upstream and downstream genes for make sure you are in the right regions.
26 * Later on: look into Ensembl
27 * Look into method of homology instead of blating.
28 * Mention advantages of using mupa.
29 * Mention the difference between using arrows and scroll bar
30 * Document the color for motifs
31 * Update for Mac user left-click
33 * Wormbase/Flybase/mirBASE tutorials
46 * Analysis "Save As" feature
51 .. INSERT CHANGE LOG HERE
52 .. END INSERT CHANGE LOG
54 Features to be Implemented
55 --------------------------
57 * Motif editor supporting more than 10 motifs
58 (Status: http://woldlab.caltech.edu/cgi-bin/mussa/ticket/122)
59 * Save motifs from Mussagl
60 (Status: http://woldlab.caltech.edu/cgi-bin/mussa/ticket/133)
62 For an up-to-date list of features to be implemented visit:
63 http://woldlab.caltech.edu/cgi-bin/mussa/roadmap
72 Mussa is an N-way version of the FamilyRelations (which is a part of
73 the Cartwheel project) 2-way comparative sequence analysis
74 software. Given DNA sequence from N species, Mussa uses all possible
75 pairwise comparions to derive an N-wise comparison. For example, given
76 sequences 1,2,3, and 4, Mussa makes 6 2-way comparisons: 1vs2, 1vs3,
77 1vs4, 2vs3, 2vs4, and 3vs4. It then compares all the links between
78 these comparisons, saving those that satisfy a transitivity
79 requirement. The saved paths are then displayed in an interactive
82 Short History of Mussa
83 ----------------------
85 Mussa Python/PMW Prototype
86 ~~~~~~~~~~~~~~~~~~~~~~~~~~
88 First Python/PMW based protoype.
93 A rewrite for speed purposes using C++ and FLTK GUI toolkit.
98 Refactored version using the more elegant Qt GUI framework and
99 OpenGL for hardware acceleration for those who have better graphics
108 Mussagl has been released open source under the `GPL v2
116 You have the option of building from source or downloading prebuilt
117 binaries. Most people will want the prebuilt versions.
121 * Mac OS X (binary or source)
122 * Windows XP (binary or source)
128 Mussagl in binary form for OS X and Windows and/or source can be
129 downloaded from http://mussa.caltech.edu/.
136 Once you have downloaded the .dmg file, double click on it and follow
137 the install instructions.
139 FIXME: Mention how to launch the program.
144 Once you have downloaded the Mussagl installer, double click on the
145 installer and follow the install instructions.
147 To start Mussagl, launch the program from Start > Programs > Mussagl >
153 Currently we do not have a binary installer for Linux. You will have
154 to build from source. See the 'build from source' section below.
160 Instructions for building from source can be found `build page
161 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild>`_ on the
170 If you already have your data, you can skip ahead to the the `Using
173 Let's say you have a gene of interest called 'SMN1' and you want to
174 know how the sequence surrounding the gene in multiple species is
175 conserved. Guess what, that's what we are going to do, retrieve the
176 DNA sequence for SMN1 and prepare it for using in Mussa.
178 For more information about SMN1 visit `NCBI's OMIM
179 <http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
181 UCSC Genome Browser Method
182 --------------------------
184 There are many methods of retrieving DNA sequence, but for this
185 example we will retrieve SMN1 through the UCSC genome browser located
186 at http://genome.ucsc.edu/.
188 We have made the SMN1 data available at
189 http://woldlab.caltech.edu/cgi-bin/mussa/wiki/ExampleData if you
190 prefer to skip this section.
192 .. image:: images/ucsc_genome_browser_home.png
193 :alt: UCSC Genome Browser
199 The first step in finding SMN1 is to use the **Gene Sorter** menu
200 option which I have highlighted in orange below:
202 .. image:: images/ucsc_menu_bar_gene_sorter.png
203 :alt: Gene Sorter Menu Option
208 .. image:: images/ucsc_gene_sorter.png
212 We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
214 .. image:: images/ucsc_gs_sort_name_sim.png
215 :alt: Gene Sorter - Name Similarity
218 After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
220 .. image:: images/ucsc_gs_smn1.png
224 Press **Go!** and you should see the following page:
226 .. image:: images/ucsc_gs_found.png
230 Click on **SMN1** and you will be taking the gene expression atlas
233 .. image:: images/ucsc_gs_genome_position.png
234 :alt: Gene expression atlas
237 Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
240 Now we have found the location of SMN1 on human!
242 .. image:: images/ucsc_gb_smn1_human.png
243 :alt: Genome Browser - SMN1 (human)
247 Step 2 - Download CDS/UTR sequence for annotations
248 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
250 Since we have found **SMN1**, this would be a convenient time to extract
251 the DNA sequence for the CDS and UTRs of the gene to use it as an
252 annotation_ in Mussa.
254 **Click on SMN1** shown **between** the **two orange arrows** shown
257 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
258 :alt: Genome Browser - SMN1 (human) - Orange Arrows
261 You should find yourself at the SMN1 description page.
263 .. image:: images/ucsc_gb_smn1_description_page.png
264 :alt: Genome Browser - SMN1 (human) - Description page
267 **Scroll down** until you get to the **Sequence section** and click on
268 **Genomic (chr5:70,256,524-70,284,592)**.
270 .. image:: images/ucsc_gb_smn1_human_sequence.png
271 :alt: Genome Browser - SMN1 (human) - Sequence
274 You should now be at the **Genomic sequence near gene** page:
276 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
277 :alt: Genome Browser - SMN1 (human) - Get genomic sequence
280 Make the following changes (highlighted in orange in the screenshot
283 1. UNcheck **introns**.
284 (We only want to annotate CDS and UTRs.)
285 2. Select **one FASTA record** per **region**.
286 (Mussa needs each CDS and UTR represented by one FASTA record per CDS/UTR).
287 3. Select **CDS in upper case, UTR in lower case.**
289 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
290 :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
293 Now click the **submit** button. You will then see a FASTA file with
294 many FASTA records representing the CDS and UTRS.
296 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
297 :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
300 Now you need to save the FASTA records to a **text file**. If you are
301 using **Firefox** or **Internet Explorer 6+** click on the **File >
302 Save As** menu option.
304 **IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
305 repeat **NOT Webpage Complete** (see screenshot below.)
307 Type in **smn1_human_annot.txt** for the file name.
309 .. image:: images/smn1_human_annot.png
310 :alt: Genome Browser - SMN1 (human) - sequence annotation file
313 **IMPORTANT:** You should open the file with a text editor and make
314 sure **no HTML** was saved... If you find any HTML markup, delete
315 the markup and save the file.
317 Now we are going to **modify the file** you just saved to **add the
318 name of the species** to the **annotation file**. All you have to do
319 is **add a new line** at the **top of the file** with the word **'Human'** as
322 .. image:: images/smn1_human_annot_plus_human.png
323 :alt: Genome Browser - SMN1 (human) - sequence annotation file
326 You can add more annotations to this file if you wish. See the
327 `annotation file format`_ section for details of the file format. By
328 including FASTA records in the annotation_ file, Mussa searches your
329 DNA sequence for an exact match of the sequence in the annotation_
330 file. If found, it will be marked as an annotation_ within Mussa.
333 Step 3 - Download gene and upstream/downstream sequence
334 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
336 Use the back button in your web browser to get back the **genome
337 browser view** of **SMN1** as shown below.
339 .. image:: images/ucsc_gb_smn1_human.png
340 :alt: Genome Browser - SMN1 (human)
343 There are two options for getting additional sequence around your
344 gene. The more complex way is to zoom out so that you have the
345 sequence you want being shown in the genome browser and then follow
346 the directions for the following method.
348 The second option, which we will choose, is to leave the genome
349 browser zoomed exactly at the location of SMN1 and click on the
350 **DNA** option on the menu bar (shown with orange arrows in the
353 .. image:: images/ucsc_gb_smn1_human_dna_option.png
354 :alt: Genome Browser - SMN1 (human) - DNA Option
357 Now in the **get dna in window** page, let's add an arbitrary amount of
358 extra sequence on to each end of the gene, let's say 5000 base pairs.
360 .. image:: images/ucsc_gb_smn1_human_get_dna.png
361 :alt: Genome Browser - SMN1 (human) - Get DNA
364 Click the **get DNA** button.
366 .. image:: images/ucsc_gb_smn1_human_dna.png
367 :alt: Genome Browser - SMN1 (human) - DNA
370 Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
371 did in step 2 with the annotation file.
373 **IMPORTANT:** Make sure the file is saved as a text file and not an
374 HTML file. Open the file with a text editor and remove any HTML markup
378 Step 4 - Same/similar/related gene other species.
379 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
381 What good is a multiple sequence alignment viewer without multiple
382 sequences? Let'S find a similar gene in a few more species.
384 Use the back button on your web browser until you get the **genome
385 browser view** of **SMN1** as shown below.
387 .. image:: images/ucsc_genome_browser_home.png
388 :alt: UCSC Genome Browser
391 **Click on SMN1** shown **between** the **two orange arrows** shown
394 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
395 :alt: Genome Browser - SMN1 (human) - Orange Arrows
398 You should find yourself at the SMN1 description page.
400 .. image:: images/ucsc_gb_smn1_description_page.png
401 :alt: Genome Browser - SMN1 (human) - Description page
404 **Scroll down** until you get to the **Sequence section** and click on
405 **Protein (262 aa)**.
407 .. image:: images/ucsc_gb_smn1_human_sequence.png
408 :alt: Genome Browser - SMN1 (human) - Sequence
411 Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
412 > Copy** option from the menu.
414 .. image:: images/smn1_human_protein.png
415 :alt: Genome Browser - SMN1 (human) - Protein
418 Press the back button on the web browser once and then scroll to the
419 top of the page and click on the **BLAT** option on the menu bar
420 (shown below with orange arrows).
422 .. image:: images/ucsc_gb_smn1_human_blat.png
423 :alt: Genome Browser - SMN1 (human) - Blat
426 **Paste** in the **protein sequence** and **change** the **genome** to
427 **mouse** as shown below and then click **submit**.
429 .. image:: images/ucsc_gb_smn1_human_blat_paste.png
430 :alt: Genome Browser - SMN1 (human) - Blat paste protein
433 Notice that we have two hits, one of which looks pretty good at 89.9%
436 .. image:: images/ucsc_gb_smn1_human_blat_hits.png
437 :alt: Genome Browser - SMN1 (human) - Blat hits
440 **Click** on the **brower** link next to the 89.9% match. Notice in
441 the genome browser (shown below) that there is an annotated gene
442 called SMN1 for mouse which matches the line called **your sequence
443 from blat search**. This means we are fairly confidant we found the
444 right location in the mouse genome.
446 .. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
447 :alt: Genome Browser - SMN1 (human) - Blat to browser
450 Follow steps 1 through 3 for mouse and then repeat step 4 with the
451 human protein sequence to find **SMN1** in the following species (if
462 Make sure to save the extended DNA sequence and annotation file for
471 Launch Mussagl... It should look similar to the screen shot below.
473 .. image:: images/opened.png
480 ----------------------
482 Currently there are three ways to load a Mussa experiment.
484 1. `Create a new analysis`_
485 2. `Load a mussa parameter file`_ (.mupa)
486 3. `Load an analysis`_
490 Create a new analysis
491 ~~~~~~~~~~~~~~~~~~~~~
493 To create a new analysis select 'Define analysis' from the 'File'
494 menu. You should see a dialog box similar to the one below. For this
495 demo we will use the example sequences that come with Mussagl.
497 .. image:: images/define_analysis.png
498 :alt: Define Analysis
503 1. **Give the experiment a name**, for this demo, we'll use
504 'demo_w30_t20'. Mussa will create a folder with this name to store
505 the analysis files in once it has been run.
507 2. Choose a `window size`_. For this demo **choose 30**.
509 3. Choose a threshold_... for this demo **choose 20**. See the
510 Threshold_ section for more detailed information.
512 4. Choose the number of sequences_ you would like. For this demo
515 .. image:: images/define_analysis_step1a.png
519 Now click on the 'Browse' button next to the sequence input box and
520 then select /examples/seq/human_mck_pro.fa file. Do the same in the
521 next two sequence input boxes selecting mouse_mck_pro.fa and
522 rabbit_mck_pro.fa as shown below. Note that you can create annotation
523 files using the mussa `Annotation File Format`_ to add annotations to
526 .. image:: images/define_analysis_step2.png
527 :alt: Choose sequences
530 Click the **create** button and in a few moments you should see
531 something similar to the following screen shot.
533 .. image:: images/demo.png
537 This analysis is now saved in a directory called **demo_w30_t20** in
538 the current working directory. If you close and reopen Mussagl, you
539 can reload the saved analysis. See `Load an analysis`_ section below
543 Load a mussa parameter file
544 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
546 If you prefer, you can define your Mussa analysis using the Mussa
547 parameter file. See the `Parameter File Format`_ section for details
548 on creating a .mupa file.
550 Once you have a .mupa file created, load Mussagl and select the **File >
551 Load Mussa Parameters** menu option. Select the .mupa file and click
554 .. image:: images/load_mupa_menu.png
555 :alt: Load Mussa Parameters
558 If you would like to see an example, you can load the
559 **mck3test.mupa** file in the examples directory that comes with
562 .. image:: images/load_mupa_dialog.png
563 :alt: Load Mussa Parameters Dialog
570 To load a previously run analysis open Mussagl and select the **File >
571 Load Analysis** menu option. Select an analysis **directory** and
574 .. image:: images/load_analysis_menu.png
575 :alt: Load Analysis Menu
584 .. Screen-shot with numbers showing features.
586 .. image:: images/window_overview.png
592 1. `DNA Sequence (Black bars)`_
598 4. `Conservation tracks`_
602 6. `Zoom Factor`_ (Base pairs per pixel)
604 7. `Dynamic Threshold`_
606 8. `Sequence Information Bar`_
608 9. `Sequence Scroll Bar`_
611 DNA Sequence (black bars)
612 ~~~~~~~~~~~~~~~~~~~~~~~~~
614 .. image:: images/sequence_bar.png
618 Each of the black bars represents one of the loaded sequences, in this
619 case the sequence around the gene 'MCK' in human, mouse, and rabbit.
621 FIXME: Should I mention the repeats here?
627 .. figure:: images/annotation.png
631 Annotation shown in green on sequence bar.
634 Annotations can be included on any of the sequences using the `Load a
635 mussa parameter file`_ method of loading your sequences. You can
636 define annotations by location or using an exact sub-sequence and you
637 may also choose any color for display of the annotation; see the
638 `Annotation File Format`_ section for details.
640 Note: Currently there is no way to add annotations using the GUI (only
641 via the .mupa file). We plan to add this feature in the future, but it
642 likely will not make it into the first release.
648 .. figure:: images/motif.png
652 Motif shown in light blue on sequence bar.
654 The only real difference between an annotation and motif in Mussagl is
655 that you can define motifs from within the GUI. See the `Motifs`_
656 section for more information.
662 .. figure:: images/conservation_tracks.png
663 :alt: Conservation Tracks
666 Conservations tracks shown as red and blue lines between sequence
669 The **red lines** between the sequence bars represent conservation
670 between the sequences and **blue lines** represent **reverse
671 complement** conservation. The amount of sequence conservation shown
672 will depend on the relatedness of your sequences and the `dynamic
673 threshold` you are using. Sequences with lots of repeats will cause
674 major slow downs in calculating the matches.
680 .. image:: images/motif_toggle.png
684 Toggles motifs on and off. This will not turn on and off annotations.
686 Note: As of the current build (#200), this feature hasn't been
693 .. image:: images/zoom_factor.png
697 The zoom factor represents the number of base pairs represented per
698 pixel. When you zoom in far enough the sequence will switch from
699 seeing a black bar, representing the sequence, to the actual sequence
700 (well, ASCII representation of sequence).
706 .. image:: images/dynamic_threshold.png
707 :alt: Dynamic Threshold
710 You can dynamically change the threshold for how strong a match you
711 consider the conservation to be with one of two options:
713 1. Number of base pair matches out of window size.
715 2. Percent base pair conservation.
717 See the Threshold_ section for more information.
720 Sequence Information Bar
721 ~~~~~~~~~~~~~~~~~~~~~~~~
723 .. image:: images/seq_info_bar.png
724 :alt: Sequence Information Bar
727 The sequence information bars can be found to the left and right sides
728 of Mussagl. Next to each sequence you will find the following
731 1. Species (If it has been defined)
732 2. Total Size of Sequence
733 3. Current base pair position
739 .. image:: images/scroll_bar.png
740 :alt: Sequence Scroll Bar
743 The scroll bar allows you to scroll through the sequence which is
744 useful when you have zoomed in using the `zoom factor`_.
753 Currently annotations can be added to a sequence using the mussa
754 `annotation file format`_ and can be loaded by selecting the
755 annotation file when defining a new analysis (see `Create a new
756 analysis`_ section) or by defining a .mupa file pointing to your
757 annotation file (see `Load a mussa parameter file`_ section).
762 Load Motifs from File
763 *********************
765 It is possible to load motifs from a file which was saved from a
766 previous run or by defining your own motif file. See the `Motif File
767 Format`_ section for details.
769 NOTE: Valid motif list file extensions are:
774 To load a motif file, select **Load Motif List** item from the
775 **File** menu and select a motif list file.
777 .. image:: images/load_motif.png
778 :alt: Load Motif List
785 Note: Currently not implemented
794 * Allow for toggling individual motifs on and off.
797 * Field added for naming motifs.
799 Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
800 Code`_ for defining a motif. To define a motif, select **Edit > Edit
801 Motifs** menu item as shown below.
803 .. image:: images/view_edit_motifs.png
804 :alt: "View > Edit Motifs" Menu
807 You will see a dialog box appear with a "set motifs" button and 10
808 rows for defining motifs and the color that will be displayed on the
809 sequence. By default all 10 motifs start off as with white as the
810 color. In the image below, I changed the color from white to blue to
811 make it easier to see. The first text box is for the motif and the
812 second box is for the name of the motif. The check box defines whether
813 the motif is displayed or not.
815 .. image:: images/motif_dialog_start.png
819 Now let's make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
820 Code`_, type in **'ATSCT'** into the first box and 'My Motif' for the
821 name in the second box as shown below.
823 .. image:: images/motif_dialog_enter_motif.png
827 Now choose a color for your motif by clicking on the colored area to
828 the left of the motif. In the image above, you would click on the blue
829 square, but by default the squares will be white. Remember to choose a
830 color that will show up well with a black bar as the background.
832 .. image:: images/color_chooser.png
836 Once you have selected the color for your motif, click on the 'set
837 motifs' button. Notice that if Mussa finds matches to your motif will
838 now show up in the main Mussagl window.
842 .. image:: images/motif_dialog_bar_before.png
843 :alt: Sequence bar before motif
848 .. image:: images/motif_dialog_bar_after.png
849 :alt: Sequence bar after motif
853 View Mussa Alignments
854 ---------------------
856 Mussagl allows you to zoom in on Mussa alignments by selecting the set
857 of alignment(s) of interest. To do this, move the mouse near the
858 alignment you are interested in viewing and then **PRESS** and
859 **HOLD** the **LEFT mouse button** and **drag the mouse** to the other
860 side of the conservation track so that you see a bounding box
861 overlaping the alienment(s) of interest and then **let go** of the
864 In the example below, I started by left-clicking on the area marked by
865 a red dot (upper left corner of bounding box) and dragging the mouse to
866 the area marked by a blue dot (lower right corner of the bounding box)
867 and letting go of the left mouse button.
869 .. image:: images/select_sequence.png
870 :alt: Select Sequence
873 All of the lines which were not selected should be washed out as shown
876 .. image:: images/washed_out.png
877 :alt: Tracks washed out
880 With a selection made, goto the **View** menu and select **View mussa alignment**.
882 .. image:: images/view_mussa_alignment.png
883 :alt: View mussa alignment
886 You should see the alignment at the base-pair level as shown below.
888 .. image:: images/mussa_alignment.png
889 :alt: Mussa alignment
896 To run a sub-analysis **highlight** a section of sequence and *right
897 click* on it and select **Add to subanalysis**. To the same for the
898 sequences shown in orange in the screenshot below. Note that you **are
899 NOT limited** to selecting more than one subsequence from the same
902 .. image:: images/subanalysis_select_seqs.png
903 :alt: Subanalysis sequence selection
906 Once you have added your sequences for subanalysis, choose a `window size`_ and `threshold`_ and click **Ok**.
908 .. image:: images/subanalysis_dialog.png
909 :alt: Subanalysis Dialog
912 A new Mussa window will appear with the subanalysis of your sequences
913 once it's done running. This may take a while if you selected large
914 chunks of sequence with a loose threshold.
916 .. image:: images/subanalysis_done.png
917 :alt: Subalaysis complete
921 Copying sequence to clipboard
922 -----------------------------
924 To copy a sequence to the clipboard, highlight a section of sequence,
925 as shown in the screen shot below, and do one of the following:
927 * Select **Copy as FASTA** from the **Edit** menu.
928 * **Right-Click (Left-click + Apple/Command Key on Mac)** on the highlighted sequence and select **Copy as FASTA**.
929 * Press **Ctrl + C (on PC)** or **Apple/Command Key + C (on Mac)** on the keyboard.
931 .. image:: images/copy_sequence.png
936 ---------------------------------
938 * Updated to build 419.
940 To save your current mussa view to an image, select **File > Save to
941 image...** as shown below.
943 .. image:: images/save_to_image_menu.png
944 :alt: File > Save to image...
947 You can define the width and the height of the image to save. By
948 default it will use the same size of your current view. Since the
949 Mussa view is implemented using vectors, if you choose a larger size
950 then your current view, Mussa will redraw at the higher resolution
951 when saving. In other words, you get higher quality images when saving
952 at a higher resolution.
954 If you check the "Lock aspect ratio" check box, which I have circled
955 in red, then when you change one value, say width, the other, height,
956 will update automatically to keep the same aspect ratio.
958 .. image:: images/save_to_image_dialog.png
959 :alt: Save to image dialog
962 Click save and choose a location and filename for your file.
964 The valid image formats are:
966 * .png (default if no extension specified.)
976 The threshold of an analysis is in minimum number of base pair matches
977 must be meet to in order to be kept as a match. Note that you can vary
978 the threshold from within Mussagl. For example, if you choose a
979 `window size`_ of **30** and a **threshold** of **20** the mussa nway
980 transitive algorithm will store all matches that are 20 out of 30 bp
981 matches or better and pass it on to Mussagl. Mussagl will then allow
982 you to dynamically choose a threshold from 20 to 30 base pairs. A
983 threshold of 30 bps would only show 30 out of 30 bp matches. A
984 threshold of 20 bps would show all matches of 20 out of 30 bps or
985 better. If you would like to see results for matches lower than 20 out
986 of 30, you will need to rerun the analysis with a lower threshold.
991 The typical sizes people tend to choose are between 20 and 30. You
992 will likely need to experiment with this setting depending on your
993 needs and input sequence.
999 Mussa reads in sequences which are formatted in the FASTA_
1000 format. Mussa may take a long time to run (>10 minutes) if the total
1001 bp length near 280Kb. Once mussa has run once, you can reload
1002 previously run analyzes.
1004 FIXME: We have learned more about how much sequence and how many to
1005 put in Mussagl, this information should be documented here.
1013 Parameter File Format
1014 ~~~~~~~~~~~~~~~~~~~~~
1016 **File Format (.mupa):**
1020 # name of analysis directory and stem for associated files
1021 ANA_NAME <analysis_name>
1023 # if APPEND vars true, a _wXX and/or _tYY added to analysis name
1024 # where XX = WINDOW and YY = THRESHOLD
1025 # Highly recommeded with use of command line override of WINDOW or THRESHOLD
1026 APPEND_WIN <true/false>
1027 APPEND_THRES <true/false>
1029 # how many sequences are being analyzed
1032 # first sequence info
1033 SEQUENCE <FASTA_file_path>
1034 ANNOTATION <annotation_file_path>
1035 SEQ_START <sequence_start>
1037 # the second sequence info
1038 SEQUENCE <FASTA_file_path>
1039 # ANNOTATION <annotation_file_path>
1040 SEQ_START <sequence_start>
1041 # SEQ_END <sequence_end>
1043 # third sequence info
1044 SEQUENCE <FASTA_file_path>
1045 # ANNOTATION <annotation_file_path>
1047 # analyzes parameters: command line args -w -t will override these
1051 .. csv-table:: Parameter File Options:
1052 :header: "Option Name", "Value", "Default", "Required", "Description"
1053 :widths: 30 30 30 30 60
1055 "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
1056 name of directory where analysis will be saved."
1057 "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
1058 "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
1059 "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
1061 "SEQUENCE", "/FASTA/filepath.fa", "N/A", "true", "Must define one
1062 sequence per SEQUENCE_NUM."
1063 "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
1064 annotation file. See `annotation file format`_ section for more
1066 "SEQ_START", "integer", "1", "false", "Optional index into FASTA file"
1067 "SEQ_END", "integer", "1", "false", "Optional index into FASTA file"
1068 "WINDOW", "integer", "N/A", "true", "`Window Size`_"
1069 "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
1073 Annotation File Format
1074 ~~~~~~~~~~~~~~~~~~~~~~
1076 The first line in the file is the sequence name. Each line there after
1077 is a **space** separated annotation.
1079 New as of build 198:
1081 * The annotation format now supports FASTA sequences embedded in the
1082 annotation file as shown in the format example below. Mussagl will
1083 take this sequence and look for an exact match of this sequence in
1084 your sequences. If a match is found, it will label it with the name
1085 of from the FASTA header.
1091 <species/sequence_name>
1092 <start> <stop> <annotation_name> <annotation_type>
1093 <start> <stop> <annotation_name> <annotation_type>
1094 <start> <stop> <annotation_name> <annotation_type>
1095 <start> <stop> <annotation_name> <annotation_type>
1097 ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
1098 ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
1099 TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
1100 ACGTACGGCAGTACGCGGTCAGA
1101 <start> <stop> <annotation_name> <annotation_type>
1109 251 500 Glorp Glorptype
1110 751 1000 Glorp Glorptype
1111 1251 1500 Glorp Glorptype
1112 >My favorite DNA sequence
1114 1751 2000 Glorp Glorptype
1117 .. _motif_file_format:
1124 <motif> <red> <green> <blue>
1132 IUPAC Nucleotide Code
1133 ~~~~~~~~~~~~~~~~~~~~~~
1135 For your convenience, below is a table of the IUPAC Nucleotide Code.
1137 The following table is table 1 from "Nomenclature for Incompletely
1138 Specified Bases in Nucleic Acid Sequences" which can be found at
1139 http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
1141 ====== ================= ===================================
1142 Symbol Meaning Origin of designation
1143 ====== ================= ===================================
1152 S G or C Strong interaction (3 H bonds)
1153 W A or T Weak interaction (2 H bonds)
1154 H A or C or T not-G, H follows G in the alphabet
1155 B G or T or C not-A, B follows A
1156 V G or C or A not-T (not-U), V follows U
1157 D G or A or T not-C, D follows C
1158 N G or A or T or C aNy
1159 ====== ================= ===================================
1162 .. Define links below
1165 .. _GPL: http://www.opensource.org/licenses/gpl-license.php
1166 .. _wiki: http://mussa.caltech.edu
1167 .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
1168 .. _FASTA: http://en.wikipedia.org/wiki/fasta_format
1169 .. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif