8 Last updated: May 18th, 2006
10 Updated to Mussagl build: 141 (Update to 200 in progress)
23 Short History of Mussa
24 ----------------------
27 Mussa Python/PMW Prototype
28 ~~~~~~~~~~~~~~~~~~~~~~~~~~
45 Mussagl has been released open source under the `GPL v2
53 You have the option of building from source or downloading prebuilt
54 binaries. Most people will want the prebuilt versions.
58 * Mac OS X (binary or source)
59 * Windows XP (binary or source)
65 Mussagl in binary form for OS X and Windows and/or source can be
66 downloaded from http://mussa.caltech.edu/.
73 Once you have downloaded the .dmg file, dubble click on it and follow
74 the install instructions.
76 FIXME: Mention how to launch the program.
81 Once you have downloaded the Mussagl installer, double click on the
82 installer and follow the install instructions.
84 To start mussagl, launch the program from Start > Programs > Mussagl >
90 Currently we do not have a binary installer for Linux. You will have
91 to build from source. See the 'build from source' section below.
97 Instructions for building from source can be found `build page
98 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild>`_ on the
110 Launch Mussagl... It should look similar to the screen shot below.
112 .. image:: images/opened.png
119 ----------------------
121 Currently there are three ways to load a mussa experiment.
123 1. `Create a new analysis`_
124 2. `Load a mussa parameter file`_ (.mupa)
125 3. `Load an analysis`_
129 Create a new analysis
130 ~~~~~~~~~~~~~~~~~~~~~
132 To create a new analysis select 'Define analysis' from the 'File'
133 menu. You should see a dialog box similar to the one below. For this
134 demo we will use the example sequences that come with Mussagl.
136 .. image:: images/define_analysis.png
137 :alt: Define Analysis
142 1. **Give the experiement a name**, for this demo, we'll use
143 'demo_w30_t20'. Mussa will create a folder with this name to store
144 the analysis files in once it has been run.
146 2. Choose a `window size`_. For this demo **choose 30**.
148 3. Choose a threshold_... for this demo **choose 20**. See the
149 Threshold_ section for more detailed information.
151 4. Choose the number of sequences_ you would like. For this demo
154 .. image:: images/define_analysis_step1a.png
158 Now click on the 'Browse' button next to the sequence input box and
159 then select /examples/seq/human_mck_pro.fa file. Do the same in the
160 next two sequence input boxes selecting mouse_mck_pro.fa and
161 rabbit_mck_pro.fa as shown below. Note that you can create annotation
162 files using the mussa `Annotation File Format` to add annotations to
165 .. image:: images/define_analysis_step2.png
166 :alt: Choose sequences
169 Click the **create** button and in a few moments you should see
170 something similar to the following screen shot.
172 .. image:: images/demo.png
176 This analysis is now saved in a directory called **demo_w30_t20** in
177 the current working directory. If you close and reopen Mussagl, you
178 can reload the saved analysis. See `Load an analysis`_ section below
182 Load a mussa parameter file
183 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
185 If you prefer, you can define your Mussa analysis using the Mussa
186 parameter file. See the `Parameter File Format`_ section for details
187 on creating a .mupa file.
189 Once you have a .mupa file created, load Mussgl and select the **File >
190 Load Mussa Parameters** menu option. Select the .mupa file and click
193 .. image:: images/load_mupa_menu.png
194 :alt: Load Mussa Parameters
197 If you would like to see an example, you can load the
198 **mck3test.mupa** file in the examples directory that comes with
201 .. image:: images/load_mupa_dialog.png
202 :alt: Load Mussa Parameters Dialog
209 To load a previously run analysis open Mussagl and select the **File >
210 Load Analysis** menu option. Select an analysis **directory** and
213 .. image:: images/load_analysis_menu.png
214 :alt: Load Analysis Menu
223 .. Screenshot with numbers showing features.
225 .. image:: images/window_overview.png
231 1. `DNA Sequence (Black bars)`_
237 4. `Conservation tracks`_
241 6. `Zoom Factor`_ (Base pairs per pixel)
243 7. `Dynamic Threshold`_
245 8. `Sequence Information Bar`_
247 9. `Sequence Scroll Bar`_
250 DNA Sequence (black bars)
251 ~~~~~~~~~~~~~~~~~~~~~~~~~
253 .. image:: images/sequence_bar.png
257 Each of the black bars represents one of the loaded sequences, in this
258 case the sequence around the gene 'MCK' in human, mouse, and rabit.
260 FIXME: Should I mention the repeats here?
266 .. figure:: images/annotation.png
270 Annotation shown in green on sequence bar.
273 Annotations can be included on any of the sequences using the `Load a
274 mussa parameter file`_ method of loading your sequences. You can
275 define annotations by location or using an exact subsequence and you
276 may also choose any color for display of the annoation; see the
277 `Annotation File Format`_ section for details.
279 Note: Currently there is no way to add annotations using the GUI (only
280 via the .mupa file). We plan to add this feature in the future, but it
281 likely will not make it into the first release.
287 .. figure:: images/motif.png
291 Motif shown in light blue on sequence bar.
293 The only real difference between an annotation and motif in mussagl is
294 that you can define motifs from within the GUI. See the `Motifs`_
295 section for more information.
301 .. figure:: images/conservation_tracks.png
302 :alt: Conservation Tracks
305 Conservations tracks shown as red and blue lines between sequence
308 The **red lines** between the sequence bars represent conservation
309 between the sequences and **blue lines** represent **reverse
310 complement** conservation. The amount of sequence conservation shown
311 will depend on the relatedness of your sequences and the `dynamic
312 threshold` you are using. Sequences with lots of repeats will cause
313 major slow downs in calculating the matches.
319 .. image:: images/motif_toggle.png
323 Toggles motifs on and off. This will not turn on and off annotations.
325 Note: As of the current build (#200), this feature hasn't been
332 .. image:: images/zoom_factor.png
336 The zoom factor represents the number of base pairs represented per
337 pixel. When you zoom in far enough the sequence will switch from
338 seeing a black bar, representing the sequence, to the actual sequence
339 (well, ASCII representation of sequence).
345 .. image:: images/dynamic_threshold.png
346 :alt: Dynamic Threshold
349 You can dynamically change the threshold for how strong of match you
350 consider the conservation to be with one of two options:
352 1. Number of base pair matchs out of window size.
354 2. Percent base pair conservation.
356 See the Threshold_ section for more infromation.
359 Sequence Information Bar
360 ~~~~~~~~~~~~~~~~~~~~~~~~
362 .. image:: images/seq_info_bar.png
363 :alt: Sequence Information Bar
366 The sequence infomation bars can be found to the left and right sides
367 of mussagl. Next to each sequence you will find the following
370 1. Species (If it has been defined)
371 2. Total Size of Sequence
372 3. Current base pair position
378 .. image:: images/scroll_bar.png
379 :alt: Sequence Scroll Bar
382 The scroll bar allows you to scroll through the sequence which is
383 useful when you have zoomed in using the `zoom factor`_.
392 Currently annotations can be added to a sequence using the mussa
393 `annotation file format`_ and can be loaded by selecting the
394 annotation file when defining a new analysis (see `Create a new
395 analysis`_ section) or by defining a .mupa file pointing to your
396 annotation file (see `Load a mussa parameter file`_ section).
401 Load Motifs from File
402 *********************
404 It is possible to load motifs from a file which was saved from a
405 previous run or by defining your own motif file. See the `Motif File
406 Format`_ section for details.
408 To load a motif file, select **Load Motif List** item from the
409 **File** menu and select a motif list file.
411 .. image:: images/load_motif.png
412 :alt: Load Motif List
419 Note: Currently not implemented
425 Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
426 Code`_ for defining a motif. To define a motif, select **View > Edit
427 Motifs** menu item as shown below.
429 .. image:: images/view_edit_motifs.png
430 :alt: "View > Edit Motifs" Menu
433 You will see a dialog box appear with a "set motifs" button and 10
434 rows for defining motifs and the color that will be displayed on the
435 sequence. By default all 10 motifs start off as with white as the color.
437 .. image:: images/motif_dialog_start.png
448 The threshold of an analysis is in minimum number of base pair matches
449 must be meet to in order to be kept as a match. Note that you can vary
450 the threshold from within Mussagl. For example, if you choose a
451 `window size`_ of **30** and a **threshold** of **20** the mussa nway
452 transitive algorithm will store all matches that are 20 out of 30 bp
453 matches or better and pass it on to Mussagl. Mussagl will then allow
454 you to dynamically choose a threshold from 20 to 30 base pairs. A
455 threshold of 30 bps would only show 30 out of 30 bp matches. A
456 threshold of 20 bps would show all matches of 20 out of 30 bps or
457 better. If you would like to see results for matches lower than 20 out
458 of 30, you will need to rerun the analysis with a lower threshold.
463 The typical sizes people tend to choose are between 20 and 30. You
464 will likely need to experiment with this setting depending on your
465 needs and input sequence.
471 Mussa reads in sequences which are formated in the fasta_
472 format. Mussa may take a long time to run (>10 minutes) if the total
473 bp length near 280Kb. Once mussa has run once, you can reload
474 previously run analyses.
476 FIXME: We have learned more about how much sequence and how many to
477 put in mussagl, this information should be documented here.
485 Parameter File Format
486 ~~~~~~~~~~~~~~~~~~~~~
488 **File Format (.mupa):**
492 # name of anaylsis directory and stem for associated files
493 ANA_NAME <analysis_name>
495 # if APPEND vars true, a _wXX and/or _tYY added to analysis name
496 # where XX = WINDOW and YY = THRESHOLD
497 # Highly recommeded with use of command line override of WINDOW or THRESHOLD
498 APPEND_WIN <true/false>
499 APPEND_THRES <true/false>
501 # how many sequences are being analyzed
504 # first sequence info
505 SEQUENCE <fasta_file_path>
506 ANNOTATION <annotation_file_path>
507 SEQ_START <sequence_start>
509 # the second sequence info
510 SEQUENCE <fasta_file_path>
511 # ANNOTATION <annotation_file_path>
512 SEQ_START <sequence_start>
513 # SEQ_END <sequence_end>
515 # third sequence info
516 SEQUENCE <fasta_file_path>
517 # ANNOTATION <annotation_file_path>
519 # analyses parameters: command line args -w -t will override these
523 .. csv-table:: Parameter File Options:
524 :header: "Option Name", "Value", "Default", "Required", "Description"
525 :widths: 30 30 30 30 60
527 "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
528 name of directory where analysis will be saved."
529 "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
530 "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
531 "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
533 "SEQUENCE", "/fasta/filepath.fa", "N/A", "true", "Must define one
534 sequence per SEQUENCE_NUM."
535 "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
536 annotation file. See `annotation file format`_ section for more
538 "SEQ_START", "integer", "1", "false", "Optional index into fasta file"
539 "SEQ_END", "integer", "1", "false", "Optional index into fasta file"
540 "WINDOW", "integer", "N/A", "true", "`Window Size`_"
541 "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
545 Annotation File Format
546 ~~~~~~~~~~~~~~~~~~~~~~
548 The first line in the file is the sequence name. Each line there after
549 is a **space** seperated annotation.
553 * The annotation format now supports fasta sequences embeded in the
554 annotation file as shown in the format example below. Mussagl will
555 take this sequence and look for an exact match of this sequence in
556 your sequences. If a match is found, it will label it with the name
557 of from the fasta header.
563 <species/sequence_name>
564 <start> <stop> <annotation_name> <annotation_type>
565 <start> <stop> <annotation_name> <annotation_type>
566 <start> <stop> <annotation_name> <annotation_type>
567 <start> <stop> <annotation_name> <annotation_type>
569 ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
570 ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
571 TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
572 ACGTACGGCAGTACGCGGTCAGA
573 <start> <stop> <annotation_name> <annotation_type>
581 251 500 Glorp Glorptype
582 751 1000 Glorp Glorptype
583 1251 1500 Glorp Glorptype
584 >My favorite DNA sequence
586 1751 2000 Glorp Glorptype
589 .. _motif_file_format:
596 <motif> <red> <green> <blue>
604 IUPAC Nucleotide Code
605 ~~~~~~~~~~~~~~~~~~~~~
607 For your convience, below is a table of the IUPAC Nucleotide Code.
609 The following table is table 1 from "Nomenclature for Incompletely
610 Specified Bases in Nucleic Acid Sequences" which can be found at
611 http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
613 ====== ================= ===================================
614 Symbol Meaning Origin of designation
615 ====== ================= ===================================
624 S G or C Strong interaction (3 H bonds)
625 W A or T Weak interaction (2 H bonds)
626 H A or C or T not-G, H follows G in the alphabet
627 B G or T or C not-A, B follows A
628 V G or C or A not-T (not-U), V follows U
629 D G or A or T not-C, D follows C
630 N G or A or T or C aNy
631 ====== ================= ===================================
634 .. Define links below
637 .. _GPL: http://www.opensource.org/licenses/gpl-license.php
638 .. _wiki: http://mussa.caltech.edu
639 .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
640 .. _fasta: http://en.wikipedia.org/wiki/FASTA_format
641 .. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif