==============
Mussagl Manual
==============
-------------------
-By Brandon W. King
-------------------
+---------------
+Brandon W. King
+---------------
Last updated: May 18th, 2006
Download
--------
-Mussagl can be downloaded from http://mussa.caltech.edu/.
+Mussagl in binary form for OS X and Windows and/or source can be
+downloaded from http://mussa.caltech.edu/.
Install
-------
Mac OS X
~~~~~~~~
-Once you have downloaded the .dmg file, dubble click on it and follow
+Once you have downloaded the .dmg file, double click on it and follow
the install instructions.
FIXME: Mention how to launch the program.
Once you have downloaded the Mussagl installer, double click on the
installer and follow the install instructions.
-To start mussagl, launch the program from Start > Programs > Mussagl >
-Mussgl.
+To start Mussagl, launch the program from Start > Programs > Mussagl >
+Mussagl.
Linux
Create/Load Analysis
----------------------
-Currently there are three ways to load a mussa experiment.
+Currently there are three ways to load a Mussa experiment.
1. `Create a new analysis`_
2. `Load a mussa parameter file`_ (.mupa)
Instructions:
- 1. **Give the experiement a name**, for this demo, we'll use
+ 1. **Give the experiment a name**, for this demo, we'll use
'demo_w30_t20'. Mussa will create a folder with this name to store
the analysis files in once it has been run.
Now click on the 'Browse' button next to the sequence input box and
then select /examples/seq/human_mck_pro.fa file. Do the same in the
next two sequence input boxes selecting mouse_mck_pro.fa and
-rabbit_mck_pro.fa as shown below.
+rabbit_mck_pro.fa as shown below. Note that you can create annotation
+files using the mussa `Annotation File Format` to add annotations to
+your sequence.
.. image:: images/define_analysis_step2.png
:alt: Choose sequences
parameter file. See the `Parameter File Format`_ section for details
on creating a .mupa file.
-Once you have a .mupa file created, load Mussgl and select the **File >
+Once you have a .mupa file created, load Mussagl and select the **File >
Load Mussa Parameters** menu option. Select the .mupa file and click
open.
Overview
~~~~~~~~
-.. Screenshot with numbers showing features.
+.. Screen-shot with numbers showing features.
.. image:: images/window_overview.png
:alt: Mussa Window
:align: center
Each of the black bars represents one of the loaded sequences, in this
-case the sequence around the gene 'MCK' in human, mouse, and rabit.
+case the sequence around the gene 'MCK' in human, mouse, and rabbit.
FIXME: Should I mention the repeats here?
Annotations can be included on any of the sequences using the `Load a
mussa parameter file`_ method of loading your sequences. You can
-define annotations by location or using an exact subsequence and you
-may also choose any color for display of the annoation; see the
+define annotations by location or using an exact sub-sequence and you
+may also choose any color for display of the annotation; see the
`Annotation File Format`_ section for details.
Note: Currently there is no way to add annotations using the GUI (only
Motif shown in light blue on sequence bar.
-The only real difference between an annotation and motif in mussagl is
+The only real difference between an annotation and motif in Mussagl is
that you can define motifs from within the GUI. See the `Motifs`_
section for more information.
:alt: Conservation Tracks
:align: center
- Conservations tracks shown as red lines between sequence bars.
+ Conservations tracks shown as red and blue lines between sequence
+ bars.
-The red lines between the sequence bars represent conservation between
-the sequences. The amount of sequence conservation shown will depend
-on the relatedness of your sequences and the `dynamic threshold` you
-are using. Sequences with lots of repeats will cause major slow downs
-in calculating the matches.
+The **red lines** between the sequence bars represent conservation
+between the sequences and **blue lines** represent **reverse
+complement** conservation. The amount of sequence conservation shown
+will depend on the relatedness of your sequences and the `dynamic
+threshold` you are using. Sequences with lots of repeats will cause
+major slow downs in calculating the matches.
Motif Toggle
You can dynamically change the threshold for how strong of match you
consider the conservation to be with one of two options:
- 1. Number of base pair matchs out of window size.
+ 1. Number of base pair matches out of window size.
2. Percent base pair conservation.
-See the Threshold_ section for more infromation.
+See the Threshold_ section for more information.
Sequence Information Bar
:alt: Sequence Information Bar
:align: center
-The sequence infomation bars can be found to the left and right sides
-of mussagl. Next to each sequence you will find the following
+The sequence information bars can be found to the left and right sides
+of Mussagl. Next to each sequence you will find the following
information:
1. Species (If it has been defined)
Annotations
~~~~~~~~~~~
+Currently annotations can be added to a sequence using the mussa
+`annotation file format`_ and can be loaded by selecting the
+annotation file when defining a new analysis (see `Create a new
+analysis`_ section) or by defining a .mupa file pointing to your
+annotation file (see `Load a mussa parameter file`_ section).
+
Motifs
~~~~~~
Load Motifs from File
*********************
+It is possible to load motifs from a file which was saved from a
+previous run or by defining your own motif file. See the `Motif File
+Format`_ section for details.
+
+To load a motif file, select **Load Motif List** item from the
+**File** menu and select a motif list file.
+
+.. image:: images/load_motif.png
+ :alt: Load Motif List
+ :align: center
+
+
+Save Motifs to File
+*******************
+
+Note: Currently not implemented
+
+
Motif Dialog
************
+Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
+Code`_ for defining a motif. To define a motif, select **View > Edit
+Motifs** menu item as shown below.
+
+.. image:: images/view_edit_motifs.png
+ :alt: "View > Edit Motifs" Menu
+ :align: center
+
+You will see a dialog box appear with a "set motifs" button and 10
+rows for defining motifs and the color that will be displayed on the
+sequence. By default all 10 motifs start off as with white as the
+color. In the image below, I changed the color from white to blue to
+make it easier to see.
+
+.. image:: images/motif_dialog_start.png
+ :alt: Motif Dialog
+ :align: center
+
+Now lets make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
+Code`_, type in **'ATSCT'** into the first box as shown below.
+
+.. image:: images/motif_dialog_enter_motif.png
+ :alt: Enter Motif
+ :align: center
+
+Now choose a color for your motif by clicking on the colored area to
+the left of the motif. In the image above, you would click on the blue
+square, but by default the squares will be white. Remember to choose a
+color that will show up well with a black bar as the background.
+
+.. image:: images/color_chooser.png
+ :alt: Color Chooser
+ :align: center
+
+Once you have selected the color for your motif, click on the 'set
+motifs' button. Notice that if Mussa finds matches to your motif will
+now show up in the main Mussagl window.
+
+Before Motif:
+
+.. image:: images/motif_dialog_bar_before.png
+ :alt: Sequence bar before motif
+ :align: center
+
+After Motif:
+
+.. image:: images/motif_dialog_bar_after.png
+ :alt: Sequence bar after motif
+ :align: center
+
-Detailed Info
--------------
+Detailed Information
+--------------------
Threshold
~~~~~~~~~
Sequences
~~~~~~~~~
-Mussa reads in sequences which are formated in the fasta_
+Mussa reads in sequences which are formatted in the fasta_
format. Mussa may take a long time to run (>10 minutes) if the total
bp length near 280Kb. Once mussa has run once, you can reload
-previously run analyses.
+previously run analyzes.
FIXME: We have learned more about how much sequence and how many to
-put in mussagl, this information should be documented here.
+put in Mussagl, this information should be documented here.
Mussa File Formats
::
- # name of anaylsis directory and stem for associated files
+ # name of analysis directory and stem for associated files
ANA_NAME <analysis_name>
# if APPEND vars true, a _wXX and/or _tYY added to analysis name
SEQUENCE <fasta_file_path>
# ANNOTATION <annotation_file_path>
- # analyses parameters: command line args -w -t will override these
+ # analyzes parameters: command line args -w -t will override these
WINDOW <num>
THRESHOLD <num>
"APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
"APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
"SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
- to analyse"
+ to analyze"
"SEQUENCE", "/fasta/filepath.fa", "N/A", "true", "Must define one
sequence per SEQUENCE_NUM."
"ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
~~~~~~~~~~~~~~~~~~~~~~
The first line in the file is the sequence name. Each line there after
-is a **space** seperated annotation.
+is a **space** separated annotation.
+
+New as of build 198:
+
+ * The annotation format now supports fasta sequences embedded in the
+ annotation file as shown in the format example below. Mussagl will
+ take this sequence and look for an exact match of this sequence in
+ your sequences. If a match is found, it will label it with the name
+ of from the fasta header.
Format:
<start> <stop> <annotation_name> <annotation_type>
<start> <stop> <annotation_name> <annotation_type>
<start> <stop> <annotation_name> <annotation_type>
+ >Fasta Header
+ ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
+ ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
+ TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
+ ACGTACGGCAGTACGCGGTCAGA
+ <start> <stop> <annotation_name> <annotation_type>
...
Example:
251 500 Glorp Glorptype
751 1000 Glorp Glorptype
1251 1500 Glorp Glorptype
+ >My favorite DNA sequence
+ GATTACA
1751 2000 Glorp Glorptype
GGCC 0.0 1 1
+
+IUPAC Nucleotide Code
+~~~~~~~~~~~~~~~~~~~~~~
+
+For your convenience, below is a table of the IUPAC Nucleotide Code.
+
+The following table is table 1 from "Nomenclature for Incompletely
+Specified Bases in Nucleic Acid Sequences" which can be found at
+http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
+
+====== ================= ===================================
+Symbol Meaning Origin of designation
+====== ================= ===================================
+G G Guanine
+A A Adenine
+T T Thymine
+C C Cytosine
+R G or A puRine
+Y T or C pYrimidine
+M A or C aMino
+K G or T Keto
+S G or C Strong interaction (3 H bonds)
+W A or T Weak interaction (2 H bonds)
+H A or C or T not-G, H follows G in the alphabet
+B G or T or C not-A, B follows A
+V G or C or A not-T (not-U), V follows U
+D G or A or T not-C, D follows C
+N G or A or T or C aNy
+====== ================= ===================================
+
+
.. Define links below
------------------