8 Last updated: May 18th, 2006
10 Updated to Mussagl build: 141 (Update to 200 in progress)
23 Short History of Mussa
24 ----------------------
27 Mussa Python/PMW Prototype
28 ~~~~~~~~~~~~~~~~~~~~~~~~~~
45 Mussagl has been released open source under the `GPL v2
53 You have the option of building from source or downloading prebuilt
54 binaries. Most people will want the prebuilt versions.
58 * Mac OS X (binary or source)
59 * Windows XP (binary or source)
65 Mussagl can be downloaded from http://mussa.caltech.edu/.
72 Once you have downloaded the .dmg file, dubble click on it and follow
73 the install instructions.
75 FIXME: Mention how to launch the program.
80 Once you have downloaded the Mussagl installer, double click on the
81 installer and follow the install instructions.
83 To start mussagl, launch the program from Start > Programs > Mussagl >
89 Currently we do not have a binary installer for Linux. You will have
90 to build from source. See the 'build from source' section below.
96 Instructions for building from source can be found `build page
97 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild>`_ on the
109 Launch Mussagl... It should look similar to the screen shot below.
111 .. image:: images/opened.png
118 ----------------------
120 Currently there are three ways to load a mussa experiment.
122 1. `Create a new analysis`_
123 2. `Load a mussa parameter file`_ (.mupa)
124 3. `Load an analysis`_
128 Create a new analysis
129 ~~~~~~~~~~~~~~~~~~~~~
131 To create a new analysis select 'Define analysis' from the 'File'
132 menu. You should see a dialog box similar to the one below. For this
133 demo we will use the example sequences that come with Mussagl.
135 .. image:: images/define_analysis.png
136 :alt: Define Analysis
141 1. **Give the experiement a name**, for this demo, we'll use
142 'demo_w30_t20'. Mussa will create a folder with this name to store
143 the analysis files in once it has been run.
145 2. Choose a `window size`_. For this demo **choose 30**.
147 3. Choose a threshold_... for this demo **choose 20**. See the
148 Threshold_ section for more detailed information.
150 4. Choose the number of sequences_ you would like. For this demo
153 .. image:: images/define_analysis_step1a.png
157 Now click on the 'Browse' button next to the sequence input box and
158 then select /examples/seq/human_mck_pro.fa file. Do the same in the
159 next two sequence input boxes selecting mouse_mck_pro.fa and
160 rabbit_mck_pro.fa as shown below.
162 .. image:: images/define_analysis_step2.png
163 :alt: Choose sequences
166 Click the **create** button and in a few moments you should see
167 something similar to the following screen shot.
169 .. image:: images/demo.png
173 This analysis is now saved in a directory called **demo_w30_t20** in
174 the current working directory. If you close and reopen Mussagl, you
175 can reload the saved analysis. See `Load an analysis`_ section below
179 Load a mussa parameter file
180 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
182 If you prefer, you can define your Mussa analysis using the Mussa
183 parameter file. See the `Parameter File Format`_ section for details
184 on creating a .mupa file.
186 Once you have a .mupa file created, load Mussgl and select the **File >
187 Load Mussa Parameters** menu option. Select the .mupa file and click
190 .. image:: images/load_mupa_menu.png
191 :alt: Load Mussa Parameters
194 If you would like to see an example, you can load the
195 **mck3test.mupa** file in the examples directory that comes with
198 .. image:: images/load_mupa_dialog.png
199 :alt: Load Mussa Parameters Dialog
206 To load a previously run analysis open Mussagl and select the **File >
207 Load Analysis** menu option. Select an analysis **directory** and
210 .. image:: images/load_analysis_menu.png
211 :alt: Load Analysis Menu
220 .. Screenshot with numbers showing features.
222 .. image:: images/window_overview.png
228 1. `DNA Sequence (Black bars)`_
234 4. `Conservation tracks`_
238 6. `Zoom Factor`_ (Base pairs per pixel)
240 7. `Dynamic Threshold`_
242 8. `Sequence Information Bar`_
244 9. `Sequence Scroll Bar`_
247 DNA Sequence (black bars)
248 ~~~~~~~~~~~~~~~~~~~~~~~~~
250 .. image:: images/sequence_bar.png
254 Each of the black bars represents one of the loaded sequences, in this
255 case the sequence around the gene 'MCK' in human, mouse, and rabit.
257 FIXME: Should I mention the repeats here?
263 .. figure:: images/annotation.png
267 Annotation shown in green on sequence bar.
270 Annotations can be included on any of the sequences using the `Load a
271 mussa parameter file`_ method of loading your sequences. You can
272 define annotations by location or using an exact subsequence and you
273 may also choose any color for display of the annoation; see the
274 `Annotation File Format`_ section for details.
276 Note: Currently there is no way to add annotations using the GUI (only
277 via the .mupa file). We plan to add this feature in the future, but it
278 likely will not make it into the first release.
284 .. figure:: images/motif.png
288 Motif shown in light blue on sequence bar.
290 The only real difference between an annotation and motif in mussagl is
291 that you can define motifs from within the GUI. See the `Motifs`_
292 section for more information.
298 .. figure:: images/conservation_tracks.png
299 :alt: Conservation Tracks
302 Conservations tracks shown as red lines between sequence bars.
304 The red lines between the sequence bars represent conservation between
305 the sequences. The amount of sequence conservation shown will depend
306 on the relatedness of your sequences and the `dynamic threshold` you
307 are using. Sequences with lots of repeats will cause major slow downs
308 in calculating the matches.
314 .. image:: images/motif_toggle.png
318 Toggles motifs on and off. This will not turn on and off annotations.
320 Note: As of the current build (#200), this feature hasn't been
327 .. image:: images/zoom_factor.png
331 The zoom factor represents the number of base pairs represented per
332 pixel. When you zoom in far enough the sequence will switch from
333 seeing a black bar, representing the sequence, to the actual sequence
334 (well, ASCII representation of sequence).
340 .. image:: images/dynamic_threshold.png
341 :alt: Dynamic Threshold
344 You can dynamically change the threshold for how strong of match you
345 consider the conservation to be with one of two options:
347 1. Number of base pair matchs out of window size.
349 2. Percent base pair conservation.
351 See the Threshold_ section for more infromation.
354 Sequence Information Bar
355 ~~~~~~~~~~~~~~~~~~~~~~~~
357 .. image:: images/seq_info_bar.png
358 :alt: Sequence Information Bar
361 The sequence infomation bars can be found to the left and right sides
362 of mussagl. Next to each sequence you will find the following
365 1. Species (If it has been defined)
366 2. Total Size of Sequence
367 3. Current base pair position
373 .. image:: images/scroll_bar.png
374 :alt: Sequence Scroll Bar
377 The scroll bar allows you to scroll through the sequence which is
378 useful when you have zoomed in using the `zoom factor`_.
390 Load Motifs from File
391 *********************
403 The threshold of an analysis is in minimum number of base pair matches
404 must be meet to in order to be kept as a match. Note that you can vary
405 the threshold from within Mussagl. For example, if you choose a
406 `window size`_ of **30** and a **threshold** of **20** the mussa nway
407 transitive algorithm will store all matches that are 20 out of 30 bp
408 matches or better and pass it on to Mussagl. Mussagl will then allow
409 you to dynamically choose a threshold from 20 to 30 base pairs. A
410 threshold of 30 bps would only show 30 out of 30 bp matches. A
411 threshold of 20 bps would show all matches of 20 out of 30 bps or
412 better. If you would like to see results for matches lower than 20 out
413 of 30, you will need to rerun the analysis with a lower threshold.
418 The typical sizes people tend to choose are between 20 and 30. You
419 will likely need to experiment with this setting depending on your
420 needs and input sequence.
426 Mussa reads in sequences which are formated in the fasta_
427 format. Mussa may take a long time to run (>10 minutes) if the total
428 bp length near 280Kb. Once mussa has run once, you can reload
429 previously run analyses.
431 FIXME: We have learned more about how much sequence and how many to
432 put in mussagl, this information should be documented here.
440 Parameter File Format
441 ~~~~~~~~~~~~~~~~~~~~~
443 **File Format (.mupa):**
447 # name of anaylsis directory and stem for associated files
448 ANA_NAME <analysis_name>
450 # if APPEND vars true, a _wXX and/or _tYY added to analysis name
451 # where XX = WINDOW and YY = THRESHOLD
452 # Highly recommeded with use of command line override of WINDOW or THRESHOLD
453 APPEND_WIN <true/false>
454 APPEND_THRES <true/false>
456 # how many sequences are being analyzed
459 # first sequence info
460 SEQUENCE <fasta_file_path>
461 ANNOTATION <annotation_file_path>
462 SEQ_START <sequence_start>
464 # the second sequence info
465 SEQUENCE <fasta_file_path>
466 # ANNOTATION <annotation_file_path>
467 SEQ_START <sequence_start>
468 # SEQ_END <sequence_end>
470 # third sequence info
471 SEQUENCE <fasta_file_path>
472 # ANNOTATION <annotation_file_path>
474 # analyses parameters: command line args -w -t will override these
478 .. csv-table:: Parameter File Options:
479 :header: "Option Name", "Value", "Default", "Required", "Description"
480 :widths: 30 30 30 30 60
482 "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
483 name of directory where analysis will be saved."
484 "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
485 "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
486 "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
488 "SEQUENCE", "/fasta/filepath.fa", "N/A", "true", "Must define one
489 sequence per SEQUENCE_NUM."
490 "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
491 annotation file. See `annotation file format`_ section for more
493 "SEQ_START", "integer", "1", "false", "Optional index into fasta file"
494 "SEQ_END", "integer", "1", "false", "Optional index into fasta file"
495 "WINDOW", "integer", "N/A", "true", "`Window Size`_"
496 "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
500 Annotation File Format
501 ~~~~~~~~~~~~~~~~~~~~~~
503 The first line in the file is the sequence name. Each line there after
504 is a **space** seperated annotation.
510 <species/sequence_name>
511 <start> <stop> <annotation_name> <annotation_type>
512 <start> <stop> <annotation_name> <annotation_type>
513 <start> <stop> <annotation_name> <annotation_type>
514 <start> <stop> <annotation_name> <annotation_type>
522 251 500 Glorp Glorptype
523 751 1000 Glorp Glorptype
524 1251 1500 Glorp Glorptype
525 1751 2000 Glorp Glorptype
528 .. _motif_file_format:
535 <motif> <red> <green> <blue>
542 .. Define links below
545 .. _GPL: http://www.opensource.org/licenses/gpl-license.php
546 .. _wiki: http://mussa.caltech.edu
547 .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
548 .. _fasta: http://en.wikipedia.org/wiki/FASTA_format
549 .. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif