8 Last updated: Oct 20th, 2006
10 Updated to Mussagl build: (In process to 424)
14 * New features / change log
15 * (DONE) Comment out anything isn't implemented yet.
16 * (DONE) List of features that will be implemented in the future.
17 * Look into the homology mapping of UCSC.
18 * Add toggle to genomes.
19 * Document why one fast record per region.
20 * How to deal with the hazards of small utrs vis motif finder. (Add warning)
21 * Add warning about saving FASTA file.
22 * Add a general principles section near the top
23 * Using comparison algorithm which will pickup all repeats
24 * Add info about repeatmasking
25 * Checking upstream and downstream genes for make sure you are in the right regions.
26 * Later on: look into Ensembl
27 * Look into method of homology instead of blating.
28 * Mention advantages of using mupa.
29 * Mention the difference between using arrows and scroll bar
30 * Document the color for motifs
31 * Update for Mac user left-click
33 * Wormbase/Flybase/mirBASE tutorials
46 * Analysis "Save As" feature
51 .. INSERT CHANGE LOG HERE
52 .. END INSERT CHANGE LOG
54 Features to be Implemented
55 --------------------------
57 For an up-to-date list of features to be implemented visit:
58 http://woldlab.caltech.edu/cgi-bin/mussa/roadmap
67 Mussa is an N-way version of the FamilyRelations (which is a part of
68 the Cartwheel project) 2-way comparative sequence analysis
69 software. Given DNA sequence from N species, Mussa uses all possible
70 pairwise comparions to derive an N-wise comparison. For example, given
71 sequences 1,2,3, and 4, Mussa makes 6 2-way comparisons: 1vs2, 1vs3,
72 1vs4, 2vs3, 2vs4, and 3vs4. It then compares all the links between
73 these comparisons, saving those that satisfy a transitivity
74 requirement. The saved paths are then displayed in an interactive
77 Short History of Mussa
78 ----------------------
80 Mussa Python/PMW Prototype
81 ~~~~~~~~~~~~~~~~~~~~~~~~~~
83 First Python/PMW based protoype.
88 A rewrite for speed purposes using C++ and FLTK GUI toolkit.
93 Refactored version using the more elegant Qt GUI framework and
94 OpenGL for hardware acceleration for those who have better graphics
103 Mussagl has been released open source under the `GPL v2
111 You have the option of building from source or downloading prebuilt
112 binaries. Most people will want the prebuilt versions.
116 * Mac OS X (binary or source)
117 * Windows XP (binary or source)
123 Mussagl in binary form for OS X and Windows and/or source can be
124 downloaded from http://mussa.caltech.edu/.
131 Once you have downloaded the .dmg file, double click on it and follow
132 the install instructions.
134 FIXME: Mention how to launch the program.
139 Once you have downloaded the Mussagl installer, double click on the
140 installer and follow the install instructions.
142 To start Mussagl, launch the program from Start > Programs > Mussagl >
148 Currently we do not have a binary installer for Linux. You will have
149 to build from source. See the 'build from source' section below.
155 Instructions for building from source can be found `build page
156 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild>`_ on the
165 If you already have your data, you can skip ahead to the the `Using
168 Let's say you have a gene of interest called 'SMN1' and you want to
169 know how the sequence surrounding the gene in multiple species is
170 conserved. Guess what, that's what we are going to do, retrieve the
171 DNA sequence for SMN1 and prepare it for using in Mussa.
173 For more information about SMN1 visit `NCBI's OMIM
174 <http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
176 The SMN1 data retrieved in this section can be downloaded from the
178 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/ExampleData>`_ page if
179 you prefer to skip this section of the manual.
182 UCSC Genome Browser Method
183 --------------------------
185 There are many methods of retrieving DNA sequence, but for this
186 example we will retrieve SMN1 through the UCSC genome browser located
187 at http://genome.ucsc.edu/.
190 .. image:: images/ucsc_genome_browser_home.png
191 :alt: UCSC Genome Browser
197 The first step in finding SMN1 is to use the **Gene Sorter** menu
198 option which I have highlighted in orange below:
200 .. image:: images/ucsc_menu_bar_gene_sorter.png
201 :alt: Gene Sorter Menu Option
206 .. image:: images/ucsc_gene_sorter.png
210 We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
212 .. image:: images/ucsc_gs_sort_name_sim.png
213 :alt: Gene Sorter - Name Similarity
216 After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
218 .. image:: images/ucsc_gs_smn1.png
222 Press **Go!** and you should see the following page:
224 .. image:: images/ucsc_gs_found.png
228 Click on **SMN1** and you will be taking the gene expression atlas
231 .. image:: images/ucsc_gs_genome_position.png
232 :alt: Gene expression atlas
235 Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
238 Now we have found the location of SMN1 on human!
240 .. image:: images/ucsc_gb_smn1_human.png
241 :alt: Genome Browser - SMN1 (human)
245 Step 2 - Download CDS/UTR sequence for annotations
246 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
248 Since we have found **SMN1**, this would be a convenient time to extract
249 the DNA sequence for the CDS and UTRs of the gene to use it as an
250 annotation_ in Mussa.
252 **Click on SMN1** shown **between** the **two orange arrows** shown
255 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
256 :alt: Genome Browser - SMN1 (human) - Orange Arrows
259 You should find yourself at the SMN1 description page.
261 .. image:: images/ucsc_gb_smn1_description_page.png
262 :alt: Genome Browser - SMN1 (human) - Description page
265 **Scroll down** until you get to the **Sequence section** and click on
266 **Genomic (chr5:70,256,524-70,284,592)**.
268 .. image:: images/ucsc_gb_smn1_human_sequence.png
269 :alt: Genome Browser - SMN1 (human) - Sequence
272 You should now be at the **Genomic sequence near gene** page:
274 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
275 :alt: Genome Browser - SMN1 (human) - Get genomic sequence
278 Make the following changes (highlighted in orange in the screenshot
281 1. UNcheck **introns**.
282 (We only want to annotate CDS and UTRs.)
283 2. Select **one FASTA record** per **region**.
284 (Mussa needs each CDS and UTR represented by one FASTA record per CDS/UTR).
285 3. Select **CDS in upper case, UTR in lower case.**
287 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
288 :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
291 Now click the **submit** button. You will then see a FASTA file with
292 many FASTA records representing the CDS and UTRS.
294 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
295 :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
298 Now you need to save the FASTA records to a **text file**. If you are
299 using **Firefox** or **Internet Explorer 6+** click on the **File >
300 Save As** menu option.
302 **IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
303 repeat **NOT Webpage Complete** (see screenshot below.)
305 Type in **smn1_human_annot.txt** for the file name.
307 .. image:: images/smn1_human_annot.png
308 :alt: Genome Browser - SMN1 (human) - sequence annotation file
311 **IMPORTANT:** You should open the file with a text editor and make
312 sure **no HTML** was saved... If you find any HTML markup, delete
313 the markup and save the file.
315 Now we are going to **modify the file** you just saved to **add the
316 name of the species** to the **annotation file**. All you have to do
317 is **add a new line** at the **top of the file** with the word **'Human'** as
320 .. image:: images/smn1_human_annot_plus_human.png
321 :alt: Genome Browser - SMN1 (human) - sequence annotation file
324 You can add more annotations to this file if you wish. See the
325 `annotation file format`_ section for details of the file format. By
326 including FASTA records in the annotation_ file, Mussa searches your
327 DNA sequence for an exact match of the sequence in the annotation_
328 file. If found, it will be marked as an annotation_ within Mussa.
331 Step 3 - Download gene and upstream/downstream sequence
332 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
334 Use the back button in your web browser to get back the **genome
335 browser view** of **SMN1** as shown below.
337 .. image:: images/ucsc_gb_smn1_human.png
338 :alt: Genome Browser - SMN1 (human)
341 There are two options for getting additional sequence around your
342 gene. The more complex way is to zoom out so that you have the
343 sequence you want being shown in the genome browser and then follow
344 the directions for the following method.
346 The second option, which we will choose, is to leave the genome
347 browser zoomed exactly at the location of SMN1 and click on the
348 **DNA** option on the menu bar (shown with orange arrows in the
351 .. image:: images/ucsc_gb_smn1_human_dna_option.png
352 :alt: Genome Browser - SMN1 (human) - DNA Option
355 Now in the **get dna in window** page, let's add an arbitrary amount of
356 extra sequence on to each end of the gene, let's say 5000 base pairs.
358 .. image:: images/ucsc_gb_smn1_human_get_dna.png
359 :alt: Genome Browser - SMN1 (human) - Get DNA
362 Click the **get DNA** button.
364 .. image:: images/ucsc_gb_smn1_human_dna.png
365 :alt: Genome Browser - SMN1 (human) - DNA
368 Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
369 did in step 2 with the annotation file.
371 **IMPORTANT:** Make sure the file is saved as a text file and not an
372 HTML file. Open the file with a text editor and remove any HTML markup
376 Step 4 - Same/similar/related gene other species.
377 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
379 What good is a multiple sequence alignment viewer without multiple
380 sequences? Let'S find a similar gene in a few more species.
382 Use the back button on your web browser until you get the **genome
383 browser view** of **SMN1** as shown below.
385 .. image:: images/ucsc_genome_browser_home.png
386 :alt: UCSC Genome Browser
389 **Click on SMN1** shown **between** the **two orange arrows** shown
392 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
393 :alt: Genome Browser - SMN1 (human) - Orange Arrows
396 You should find yourself at the SMN1 description page.
398 .. image:: images/ucsc_gb_smn1_description_page.png
399 :alt: Genome Browser - SMN1 (human) - Description page
402 **Scroll down** until you get to the **Sequence section** and click on
403 **Protein (262 aa)**.
405 .. image:: images/ucsc_gb_smn1_human_sequence.png
406 :alt: Genome Browser - SMN1 (human) - Sequence
409 Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
410 > Copy** option from the menu.
412 .. image:: images/smn1_human_protein.png
413 :alt: Genome Browser - SMN1 (human) - Protein
416 Press the back button on the web browser once and then scroll to the
417 top of the page and click on the **BLAT** option on the menu bar
418 (shown below with orange arrows).
420 .. image:: images/ucsc_gb_smn1_human_blat.png
421 :alt: Genome Browser - SMN1 (human) - Blat
424 **Paste** in the **protein sequence** and **change** the **genome** to
425 **mouse** as shown below and then click **submit**.
427 .. image:: images/ucsc_gb_smn1_human_blat_paste.png
428 :alt: Genome Browser - SMN1 (human) - Blat paste protein
431 Notice that we have two hits, one of which looks pretty good at 89.9%
434 .. image:: images/ucsc_gb_smn1_human_blat_hits.png
435 :alt: Genome Browser - SMN1 (human) - Blat hits
438 **Click** on the **brower** link next to the 89.9% match. Notice in
439 the genome browser (shown below) that there is an annotated gene
440 called SMN1 for mouse which matches the line called **your sequence
441 from blat search**. This means we are fairly confidant we found the
442 right location in the mouse genome.
444 .. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
445 :alt: Genome Browser - SMN1 (human) - Blat to browser
448 Follow steps 1 through 3 for mouse and then repeat step 4 with the
449 human protein sequence to find **SMN1** in the following species (if
460 Make sure to save the extended DNA sequence and annotation file for
469 Launch Mussagl... It should look similar to the screen shot below.
471 .. image:: images/opened.png
478 ----------------------
480 Currently there are three ways to load a Mussa experiment.
482 1. `Create a new analysis`_
483 2. `Load a mussa parameter file`_ (.mupa)
484 3. `Load an analysis`_
488 Create a new analysis
489 ~~~~~~~~~~~~~~~~~~~~~
491 To create a new analysis select 'Define analysis' from the 'File'
492 menu. You should see a dialog box similar to the one below. For this
493 demo we will use the example sequences that come with Mussagl.
495 .. image:: images/define_analysis.png
496 :alt: Define Analysis
501 1. **Give the experiment a name**, for this demo, we'll use
502 'demo_w30_t20'. Mussa will create a folder with this name to store
503 the analysis files in once it has been run.
505 2. Choose a threshold_... for this demo **choose 20**. See the
506 Threshold_ section for more detailed information.
508 3. Choose a `window size`_. For this demo **choose 30**.
511 4. Choose the number of sequences_ you would like. For this demo
514 .. image:: images/define_analysis_step1a.png
518 First enter the species name of "Human" in the first "Species" text
519 box. Now click on the 'Browse' button next to the sequence input box
520 and then select /examples/seq/human_mck_pro.fa file. Do the same in
521 the next two sequence input boxes selecting mouse_mck_pro.fa and
522 rabbit_mck_pro.fa as shown below. Make sure to give them a species
523 name as well. Note that you can create annotation files using the
524 mussa `Annotation File Format`_ to add annotations to your sequence.
526 .. image:: images/define_analysis_step2.png
527 :alt: Choose sequences
530 Click the **create** button and in a few moments you should see
531 something similar to the following screen shot.
533 .. image:: images/demo.png
537 By default your analysis is NOT saved. If you try to close an analysis
538 without saving, you will be prompted with a dialog box asking you if
539 you would like to save your analysis. The `Saving`_ section for
540 details on saving your analysis. When saving, choose directory and
541 give the analysis the name **demo_w30_t20**. If you close and reopen
542 Mussagl, you will then be able to load the saved analysis. See `Load
543 an analysis`_ section below for details.
546 Load a mussa parameter file
547 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
549 If you prefer, you can define your Mussa analysis using the Mussa
550 parameter file. See the `Parameter File Format`_ section for details
551 on creating a .mupa file.
553 Once you have a .mupa file created, load Mussagl and select the **File >
554 Create Analysis from File** menu option. Select the .mupa file and click
557 .. image:: images/load_mupa_menu.png
558 :alt: Load Mussa Parameters
561 If you would like to see an example, you can load the
562 **mck3test.mupa** file in the examples directory that comes with
565 .. image:: images/load_mupa_dialog.png
566 :alt: Load Mussa Parameters Dialog
573 To load a previously run analysis open Mussagl and select the **File >
574 Open Existing Analysis** menu option. Select an analysis **directory** and
577 .. image:: images/load_analysis_menu.png
578 :alt: Load Analysis Menu
587 .. Screen-shot with numbers showing features.
589 .. image:: images/window_overview.png
595 1. `DNA Sequence (Black bars)`_
601 4. `Red conservation tracks`_
603 5. `Blue conservation tracks`_
605 6. `Zoom Factor`_ (Base pairs per pixel)
607 7. `Dynamic Threshold`_
609 8. `Sequence Information Bar`_
611 9. `Sequence Scroll Bar`_
614 DNA Sequence (black bars)
615 ~~~~~~~~~~~~~~~~~~~~~~~~~
617 .. image:: images/sequence_bar.png
621 Each of the black bars represents one of the loaded sequences, in this
622 case the sequence around the gene 'MCK' in human, mouse, and rabbit.
628 .. figure:: images/annotation.png
632 Annotation shown in green on sequence bar.
635 Annotations can be included on any of the sequences using the `Load a
636 mussa parameter file`_ or `Create a new analysis`_ method of loading
637 your sequences. You can define annotations by location or using an
638 exact sub-sequence or a FASTA sequence of the section of DNA you wish
639 to annotate. See the `Annotation File Format`_ section for details.
645 .. figure:: images/motif.png
649 Motif shown in light blue on sequence bar.
651 The only real difference between an annotation and motif in Mussagl is
652 that you can define motifs and choose a color from within the GUI. See
653 the `Motifs`_ section for more information.
656 Red conservation tracks
657 ~~~~~~~~~~~~~~~~~~~~~~~
659 .. figure:: images/conservation_tracks.png
660 :alt: Conservation Tracks
663 Conservations tracks shown as red and blue lines between sequence
666 The **red lines** between the sequence bars represent conservation
667 between the sequences (i.e. not reverse complement matches)
669 The amount of sequence conservation shown will depend on how much your
670 sequences are related and the `dynamic threshold`_ you are using.
673 Blue conservation tracks
674 ~~~~~~~~~~~~~~~~~~~~~~~~
676 .. figure:: images/conservation_tracks.png
677 :alt: Conservation Tracks
680 Conservations tracks shown as red and blue lines between sequence
683 **Blue lines** represent **reverse complement** conservation relative
684 to the sequence attached to the top of the blue line.
686 The amount of sequence conservation shown will depend on how much your
687 sequences are related and the `dynamic threshold`_ you are using.
693 .. image:: images/zoom_factor.png
697 The zoom factor represents the number of base pairs represented per
698 pixel. When you zoom in far enough the sequence will switch from
699 seeing a black bar, representing the sequence, to the actual sequence
700 (well, ASCII representation of sequence).
706 .. image:: images/dynamic_threshold.png
707 :alt: Dynamic Threshold
710 You can dynamically change the threshold for how strong a match you
711 consider the conservation to be by changing the value in the dynamic
714 The value you enter is the minimum number of base pairs that have to
715 be matched in order to be considered conserved. The second number that
716 you can't change is the `window size`_ you used when creating the
717 experiment. The last number is the percent match.
719 See the Threshold_ section for more information.
722 Sequence Information Bar
723 ~~~~~~~~~~~~~~~~~~~~~~~~
725 .. image:: images/seq_info_bar.png
726 :alt: Sequence Information Bar
729 The sequence information bars can be found to the left and right sides
730 of Mussagl. Next to each sequence you will find the following
733 1. Species (If it has been defined)
734 2. Total Size of Sequence
735 3. Current base pair position
737 Note that you can **update the species** text box. Make sure to **save your
738 experiment** after making this change by selecting **File > Save
739 Analysis** from the menu.
744 .. image:: images/scroll_bar.png
745 :alt: Sequence Scroll Bar
748 The scroll bar allows you to scroll through the sequence which is
749 useful when you have zoomed in using the `zoom factor`_.
758 When ever you create a new analysis or make a change such as
759 adding/editing a motif or changing a species name, an asterisk (*)
760 will appear in the title of the window showing that there are changes
761 that have not been saved. If you close a Mussa window without saving
762 changes, Mussa will ask you if you would like to save the changes that
768 After making changes, such as updating species names or adding/editing
769 motifs, you can save these changes by selecting the **File > Save
770 analysis** menu option or pressing **CTRL + S** (PC) or
771 **Apple/Command Key + S** (on Mac).
773 .. image:: images/save_analysis.png
780 To save a copy of your analysis to a new location, select the **File >
781 Save analysis as** menu option and choose a new location and name for
784 .. image:: images/save_analysis_as.png
791 See `Save Motifs to File`_ in the `Motifs`_ section.
794 Viewing Multiple Analyses
795 -------------------------
797 Some times it is useful to view more than one analysis at a time. To
798 do accomplish this, Mussa allows you to open a new Mussa window by
799 selecting the **File > New Mussa Window** menu option.
801 .. image:: images/new_mussa_window_menu.png
802 :alt: New Mussa Window Menu Option
805 A new Mussa window will pop up.
807 .. figure:: images/new_mussa_window.png
808 :alt: New Mussa Window
811 A new Mussa window on the right, in which a second analysis has
814 Now you can create or load an existing analysis, in this new window,
815 as described in the `Create/Load Analysis`_ section.
817 You can view as many analyses as you can fit on your screen or until
818 you run out of available RAM. If you notice a rapid decrease in
819 performance and hear lots of noise coming from your hard drive, you
820 probably ran out of RAM and are now using virtual memory (i.e. much
821 much slower). If this happens, you may need to avoid opening as many
822 analyses at one time.
831 Currently annotations can be added to a sequence using the mussa
832 `annotation file format`_ and can be loaded by selecting the
833 annotation file when defining a new analysis (see `Create a new
834 analysis`_ section) or by defining a .mupa file pointing to your
835 annotation file (see `Load a mussa parameter file`_ section).
840 Load Motifs from File
841 *********************
843 It is possible to load motifs from a file which was saved from a
844 previous run or by defining your own motif file. See the `Motif File
845 Format`_ section for details.
847 NOTE: Valid motif list file extensions are:
852 To load a motif file, select **Load Motif List** item from the
853 **File** menu and select a motif list file.
855 .. image:: images/load_motif.png
856 :alt: Load Motif List
863 Motifs from the `Motif Dialog`_ can be saved to file for use with
864 other analyses. If you just want your motifs to be saved with your
865 analysis, see the `save analysis`_ section for details.
867 To save a motif list, select **File > Save Motifs** menu option. By
868 default, Mussa will append .mtl if you do not provide a file extension
869 (valid file extensions: .mtl & .txt).
871 .. image:: images/save_motifs.png
879 Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
880 Code`_ for defining a motif. To define a motif, select **Edit > Edit
881 Motifs** menu item as shown below.
883 .. image:: images/view_edit_motifs.png
884 :alt: "View > Edit Motifs" Menu
887 You will see a dialog box appear with a "apply" button in the bottom
888 right and one rows for defining motifs and the color that will be
889 displayed on the sequence. When you start adding your first motif, an
890 additional row will be added. The check box in the first column
891 defines whether the motif is displayed or not. The second column is
892 the motif display color. The third column is for the name of your
893 motif and finally, the fourth column is motif itself.
895 .. image:: images/motif_dialog_start.png
899 Now let's make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
900 Code`_, type in **'ATSCT'** into the motif field and **'My Motif'** for
901 the name in the name field as shown below.
903 Notice how a second row appeared when you started to add the first
904 motif. Every time you add a new motif, a new row will appear allowing
905 you to add as many motifs as you need.
907 .. image:: images/motif_dialog_enter_motif.png
911 Now choose a color for your motif by clicking on the colored area to
912 the left of the name field. Remember to choose a color that will show
913 up well with a black bar as the background. A good tool for picking a
914 color is the `Colour Contrast Analyser
915 <http://juicystudio.com/services/colourcontrast.php>`_ by
916 `juicystudio.com <http://juicystudio.com/>`_.
918 .. image:: images/color_chooser.png
922 Once you have selected the color for your motif, click on the
923 **'apply'** button. Notice that if Mussa finds matches to your motif
924 will now show up in the main Mussa window.
928 .. image:: images/motif_dialog_bar_before.png
929 :alt: Sequence bar before motif
934 .. image:: images/motif_dialog_bar_after.png
935 :alt: Sequence bar after motif
938 To save your motifs with your analysis, see the `save analysis`_
939 section. To save your motifs to a file, see the `save motifs to file`_
945 To delete a motif, remove all text from the name and sequence columns
946 and close the motif editor.
948 View Mussa Alignments
949 ---------------------
951 Mussagl allows you to zoom in on Mussa alignments by selecting the set
952 of alignment(s) of interest. To do this, move the mouse near the
953 alignment you are interested in viewing and then **PRESS** and
954 **HOLD** the **LEFT mouse button** and **drag the mouse** to the other
955 side of the conservation track so that you see a bounding box
956 overlaping the alienment(s) of interest and then **let go** of the
959 In the example below, I started by left-clicking on the area marked by
960 a red dot (upper left corner of bounding box) and dragging the mouse to
961 the area marked by a blue dot (lower right corner of the bounding box)
962 and letting go of the left mouse button.
964 .. image:: images/select_sequence.png
965 :alt: Select Sequence
968 All of the lines which were not selected should be washed out as shown
971 .. image:: images/washed_out.png
972 :alt: Tracks washed out
975 With a selection made, goto the **View** menu and select **View mussa alignment**.
977 .. image:: images/view_mussa_alignment.png
978 :alt: View mussa alignment
981 You should see the alignment at the base-pair level as shown below.
983 .. image:: images/mussa_alignment.png
984 :alt: Mussa alignment
991 To run a sub-analysis **highlight** a section of sequence and *right
992 click* on it and select **Add to subanalysis**. To the same for the
993 sequences shown in orange in the screenshot below. Note that you **are
994 NOT limited** to selecting only one subsequence from the same
997 .. image:: images/subanalysis_select_seqs.png
998 :alt: Subanalysis sequence selection
1001 Once you have added your sequences for subanalysis, choose a `window size`_ and `threshold`_ and click **Ok**.
1003 .. image:: images/subanalysis_dialog.png
1004 :alt: Subanalysis Dialog
1007 A new Mussa window will appear with the subanalysis of your sequences
1008 once it's done running. This may take a while if you selected large
1009 chunks of sequence with a loose threshold.
1011 .. image:: images/subanalysis_done.png
1012 :alt: Subalaysis complete
1016 Copying sequence to clipboard
1017 -----------------------------
1019 To copy a sequence to the clipboard, highlight a section of sequence,
1020 as shown in the screen shot below, and do one of the following:
1022 * Select **Copy as FASTA** from the **Edit** menu.
1023 * **Right-Click (Left-click + Apple/Command Key on Mac)** on the highlighted sequence and select **Copy as FASTA**.
1024 * Press **Ctrl + C (on PC)** or **Apple/Command Key + C (on Mac)** on the keyboard.
1026 .. image:: images/copy_sequence.png
1032 ---------------------------------
1034 To save your current mussa view to an image, select **File > Save to
1035 image...** as shown below.
1037 .. image:: images/save_to_image_menu.png
1038 :alt: File > Save to image...
1041 You can define the width and the height of the image to save. By
1042 default it will use the same size of your current view. Since the
1043 Mussa view is implemented using vectors, if you choose a larger size
1044 then your current view, Mussa will redraw at the higher resolution
1045 when saving. In other words, you get higher quality images when saving
1046 at a higher resolution.
1048 If you check the "Lock aspect ratio" check box, which I have circled
1049 in red, then when you change one value, say width, the other, height,
1050 will update automatically to keep the same aspect ratio.
1052 .. image:: images/save_to_image_dialog.png
1053 :alt: Save to image dialog
1056 Click save and choose a location and filename for your file.
1058 The valid image formats are:
1060 * .png (default if no extension specified.)
1064 Detailed Information
1065 --------------------
1070 The threshold of an analysis is in minimum number of base pair matches
1071 must be meet to in order to be kept as a match. Note that you can vary
1072 the threshold from within Mussagl. For example, if you choose a
1073 `window size`_ of **30** and a **threshold** of **20** the mussa nway
1074 transitive algorithm will store all matches that are 20 out of 30 bp
1075 matches or better and pass it on to Mussagl. Mussagl will then allow
1076 you to dynamically choose a threshold from 20 to 30 base pairs. A
1077 threshold of 30 bps would only show 30 out of 30 bp matches. A
1078 threshold of 20 bps would show all matches of 20 out of 30 bps or
1079 better. If you would like to see results for matches lower than 20 out
1080 of 30, you will need to rerun the analysis with a lower threshold.
1085 The typical sizes people tend to choose are between 20 and 30. You
1086 will likely need to experiment with this setting depending on your
1087 needs and input sequence.
1093 Mussa reads in sequences which are formatted in the FASTA_
1094 format. Mussa may take a long time to run (>10 minutes) if the total
1095 bp length near 280Kb. Once mussa has run once, you can reload
1096 previously run analyzes.
1098 FIXME: We have learned more about how much sequence and how many to
1099 put in Mussagl, this information should be documented here.
1107 Parameter File Format
1108 ~~~~~~~~~~~~~~~~~~~~~
1110 **File Format (.mupa):**
1114 # name of analysis directory and stem for associated files
1115 ANA_NAME <analysis_name>
1117 # if APPEND vars true, a _wXX and/or _tYY added to analysis name
1118 # where XX = WINDOW and YY = THRESHOLD
1119 # Highly recommeded with use of command line override of WINDOW or THRESHOLD
1120 APPEND_WIN <true/false>
1121 APPEND_THRES <true/false>
1123 # how many sequences are being analyzed
1126 # first sequence info
1127 SEQUENCE <FASTA_file_path>
1128 ANNOTATION <annotation_file_path>
1129 SEQ_START <sequence_start>
1131 # the second sequence info
1132 SEQUENCE <FASTA_file_path>
1133 # ANNOTATION <annotation_file_path>
1134 SEQ_START <sequence_start>
1135 # SEQ_END <sequence_end>
1137 # third sequence info
1138 SEQUENCE <FASTA_file_path>
1139 # ANNOTATION <annotation_file_path>
1141 # analyzes parameters: command line args -w -t will override these
1145 .. csv-table:: Parameter File Options:
1146 :header: "Option Name", "Value", "Default", "Required", "Description"
1147 :widths: 30 30 30 30 60
1149 "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
1150 name of directory where analysis will be saved."
1151 "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
1152 "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
1153 "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
1155 "SEQUENCE", "/FASTA/filepath.fa", "N/A", "true", "Must define one
1156 sequence per SEQUENCE_NUM."
1157 "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
1158 annotation file. See `annotation file format`_ section for more
1160 "SEQ_START", "integer", "1", "false", "Optional index into FASTA file"
1161 "SEQ_END", "integer", "1", "false", "Optional index into FASTA file"
1162 "WINDOW", "integer", "N/A", "true", "`Window Size`_"
1163 "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
1167 Annotation File Format
1168 ~~~~~~~~~~~~~~~~~~~~~~
1170 The first line in the file is the sequence name. Each line there after
1171 is a **space** separated annotation.
1173 New as of build 198:
1175 * The annotation format now supports FASTA sequences embedded in the
1176 annotation file as shown in the format example below. Mussagl will
1177 take this sequence and look for an exact match of this sequence in
1178 your sequences. If a match is found, it will label it with the name
1179 of from the FASTA header.
1185 <species/sequence_name>
1186 <start> <stop> <annotation_name> <annotation_type>
1187 <start> <stop> <annotation_name> <annotation_type>
1188 <start> <stop> <annotation_name> <annotation_type>
1189 <start> <stop> <annotation_name> <annotation_type>
1191 ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
1192 ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
1193 TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
1194 ACGTACGGCAGTACGCGGTCAGA
1195 <start> <stop> <annotation_name> <annotation_type>
1203 251 500 Glorp Glorptype
1204 751 1000 Glorp Glorptype
1205 1251 1500 Glorp Glorptype
1206 >My favorite DNA sequence
1208 1751 2000 Glorp Glorptype
1211 .. _motif_file_format:
1218 <motif> <red> <green> <blue>
1226 IUPAC Nucleotide Code
1227 ~~~~~~~~~~~~~~~~~~~~~~
1229 For your convenience, below is a table of the IUPAC Nucleotide Code.
1231 The following table is table 1 from "Nomenclature for Incompletely
1232 Specified Bases in Nucleic Acid Sequences" which can be found at
1233 http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
1235 ====== ================= ===================================
1236 Symbol Meaning Origin of designation
1237 ====== ================= ===================================
1246 S G or C Strong interaction (3 H bonds)
1247 W A or T Weak interaction (2 H bonds)
1248 H A or C or T not-G, H follows G in the alphabet
1249 B G or T or C not-A, B follows A
1250 V G or C or A not-T (not-U), V follows U
1251 D G or A or T not-C, D follows C
1252 N G or A or T or C aNy
1253 ====== ================= ===================================
1267 FIXME: Include seqcomp algorithm info.
1269 FIXME: Include transitivity info.
1274 The algorithm Mussa uses to find conserved sequences is sensitive to
1275 repeated DNA segments, which are frequently occurring in most
1276 genomes. The problem with repeats, is that one repeat from one
1277 sequence can show up many times in another sequence. Every connection
1278 Mussa makes takes up memory and CPU time to process.
1280 The formula for the number of connections, C, that will be made for R
1281 instances of a single repeat (meaning R copies of one repeat in each
1282 sequence) and S sequences is:
1286 Table of example situations:
1313 After the connections, C, are found, they are passed on to the
1314 transitivity filter, which is a C^2 algorithm (FIXME: confirm
1315 algorithm is C^2). This means with 50 repeats in 2 sequences giving
1316 you a C of 2500, ends up with a C^2 of 6,250,000.
1318 **Conclusion: repeats cause the processing time of Mussa to skyrocket.**
1320 One way to deal with a situation where you have many repeats in your
1321 sequences is do any of the following: user shorter sequence lengths;
1322 repeat mask one or more of your sequences; or increase the threshold.
1327 Case: Conservation track suddenly stops
1328 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1331 .. Define links below
1334 .. _GPL: http://www.opensource.org/licenses/gpl-license.php
1335 .. _wiki: http://mussa.caltech.edu
1336 .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
1337 .. _FASTA: http://en.wikipedia.org/wiki/fasta_format
1338 .. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif