Conservation tracks + IUPAC Neucleotide Table

[mussa.git] / doc / manual / mussagl_manual.rst
diff --git a/doc/manual/mussagl_manual.rst b/doc/manual/mussagl_manual.rst

index ebf16d36a4be94c6beb344f215dca9e2d2d4c754..3f46aaa3d29f572da0404c4023c6481d466285e6 100644 (file)
--- a/doc/manual/mussagl_manual.rst
+++ b/doc/manual/mussagl_manual.rst
@@ -62,7 +62,8 @@ Supported Platforms:
  Download
  --------
  
-Mussagl can be downloaded from http://mussa.caltech.edu/.
+Mussagl in binary form for OS X and Windows and/or source can be
+downloaded from http://mussa.caltech.edu/.
  
  Install
  -------
@@ -157,7 +158,9 @@ Instructions:
  Now click on the 'Browse' button next to the sequence input box and
  then select /examples/seq/human_mck_pro.fa file. Do the same in the
  next two sequence input boxes selecting mouse_mck_pro.fa and
-rabbit_mck_pro.fa as shown below.
+rabbit_mck_pro.fa as shown below. Note that you can create annotation
+files using the mussa `Annotation File Format` to add annotations to
+your sequence.
  
  .. image:: images/define_analysis_step2.png
     :alt: Choose sequences
@@ -299,13 +302,15 @@ Conservation tracks
     :alt: Conservation Tracks
     :align: center
     
-   Conservations tracks shown as red lines between sequence bars.
+   Conservations tracks shown as red and blue lines between sequence
+   bars.
  
-The red lines between the sequence bars represent conservation between
-the sequences. The amount of sequence conservation shown will depend
-on the relatedness of your sequences and the `dynamic threshold` you
-are using. Sequences with lots of repeats will cause major slow downs
-in calculating the matches.
+The **red lines** between the sequence bars represent conservation
+between the sequences and **blue lines** represent **reverse
+complement** conservation. The amount of sequence conservation shown
+will depend on the relatedness of your sequences and the `dynamic
+threshold` you are using. Sequences with lots of repeats will cause
+major slow downs in calculating the matches.
  
  
  Motif Toggle
@@ -384,15 +389,55 @@ Annotations / Motifs
  Annotations
  ~~~~~~~~~~~
  
+Currently annotations can be added to a sequence using the mussa
+`annotation file format`_ and can be loaded by selecting the
+annotation file when defining a new analysis (see `Create a new
+analysis`_ section) or by defining a .mupa file pointing to your
+annotation file (see `Load a mussa parameter file`_ section).
+
  Motifs
  ~~~~~~
  
  Load Motifs from File
  *********************
  
+It is possible to load motifs from a file which was saved from a
+previous run or by defining your own motif file. See the `Motif File
+Format`_ section for details.
+
+To load a motif file, select **Load Motif List** item from the
+**File** menu and select a motif list file.
+
+.. image:: images/load_motif.png
+   :alt: Load Motif List
+   :align: center
+
+
+Save Motifs to File
+*******************
+
+Note: Currently not implemented
+
+
  Motif Dialog
  ************
  
+Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
+Code`_ for defining a motif. To define a motif, select **View > Edit
+Motifs** menu item as shown below.
+
+.. image:: images/view_edit_motifs.png
+   :alt: "View > Edit Motifs" Menu
+   :align: center
+
+You will see a dialog box appear with a "set motifs" button and 10
+rows for defining motifs and the color that will be displayed on the
+sequence. By default all 10 motifs start off as with white as the color.
+
+.. image:: images/motif_dialog_start.png
+   :alt: Motif Dialog
+   :align: center
+
  
  Detailed Info
  -------------
@@ -501,7 +546,15 @@ Annotation File Format
  ~~~~~~~~~~~~~~~~~~~~~~
  
  The first line in the file is the sequence name. Each line there after
-is a **space** seperated annotation.
+is a **space** seperated annotation. 
+
+New as of build 198:
+ 
+ * The annotation format now supports fasta sequences embeded in the
+   annotation file as shown in the format example below. Mussagl will
+   take this sequence and look for an exact match of this sequence in
+   your sequences. If a match is found, it will label it with the name 
+   of from the fasta header.
  
  Format:
  
@@ -512,6 +565,12 @@ Format:
    <start> <stop> <annotation_name> <annotation_type>
    <start> <stop> <annotation_name> <annotation_type>
    <start> <stop> <annotation_name> <annotation_type>
+  >Fasta Header
+  ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
+  ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
+  TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
+  ACGTACGGCAGTACGCGGTCAGA
+  <start> <stop> <annotation_name> <annotation_type>
    ...
  
  Example:
@@ -522,6 +581,8 @@ Example:
    251 500 Glorp Glorptype
    751 1000 Glorp Glorptype
    1251 1500 Glorp Glorptype
+  >My favorite DNA sequence
+  GATTACA
    1751 2000 Glorp Glorptype
  
  
@@ -539,6 +600,37 @@ Example:
    GGCC 0.0 1 1
  
  
+
+IUPAC Nucleotide Code
+~~~~~~~~~~~~~~~~~~~~~
+
+For your convience, below is a table of the IUPAC Nucleotide Code.
+
+The following table is table 1 from "Nomenclature for Incompletely
+Specified Bases in Nucleic Acid Sequences" which can be found at
+http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
+
+======  =================  ===================================
+Symbol Meaning            Origin of designation
+======  =================  ===================================
+G      G                  Guanine
+A      A                  Adenine
+T      T                  Thymine
+C      C                  Cytosine
+R      G or A             puRine
+Y      T or C             pYrimidine
+M      A or C             aMino
+K      G or T             Keto
+S      G or C             Strong interaction (3 H bonds)
+W      A or T             Weak interaction (2 H bonds)
+H      A or C or T        not-G, H follows G in the alphabet
+B      G or T or C        not-A, B follows A
+V      G or C or A        not-T (not-U), V follows U
+D      G or A or T        not-C, D follows C
+N      G or A or T or C   aNy
+======  =================  ===================================
+
+
  .. Define links below
     ------------------