Manual: UCSC Genome Browser section

[mussa.git] / doc / manual / mussagl_manual.rst
diff --git a/doc/manual/mussagl_manual.rst b/doc/manual/mussagl_manual.rst

index 1a5fb6a40737cdb9cacf18002c03388b8e4ee28b..1855192804ef7878eab68fa3d64ce351c275cd51 100644 (file)
--- a/doc/manual/mussagl_manual.rst
+++ b/doc/manual/mussagl_manual.rst
@@ -1,13 +1,13 @@
  ==============
  Mussagl Manual
  ==============
-------------------
-By Brandon W. King
-------------------
+---------------
+Brandon W. King
+---------------
  
-Last updated: March 23rd, 2006
+Last updated: May 23th, 2006
  
-Updated to Mussagl build: 141
+Updated to Mussagl build: 200 (Update to 230 in progress)
  
  
  .. contents::
@@ -62,14 +62,15 @@ Supported Platforms:
  Download
  --------
  
-Mussagl can be downloaded from http://mussa.caltech.edu/.
+Mussagl in binary form for OS X and Windows and/or source can be
+downloaded from http://mussa.caltech.edu/.
  
  Install
  -------
  
  Mac OS X
  ~~~~~~~~
-Once you have downloaded the .dmg file, dubble click on it and follow
+Once you have downloaded the .dmg file, double click on it and follow
  the install instructions. 
  
  FIXME: Mention how to launch the program.
@@ -80,8 +81,8 @@ Windows XP
  Once you have downloaded the Mussagl installer, double click on the
  installer and follow the install instructions.
  
-To start mussagl, launch the program from Start > Programs > Mussagl >
-Mussgl.
+To start Mussagl, launch the program from Start > Programs > Mussagl >
+Mussagl.
  
  
  Linux
@@ -100,6 +101,301 @@ Instructions for building from source can be found `build page
  __ wiki_
  
  
+Obtaining Input Data
+====================
+
+If you already have your data, you can skip ahead to the the `Using
+Mussagl`_ section.
+
+Lets say you have a gene of interest called 'SMN1' and you want to
+know how the sequence surrounding the gene in multiple species is
+conserved. Guess what, that's what we are going to do, retrieve the
+DNA sequence for SMN1 and prepare it for using in Mussa.
+
+For more information about SMN1 visit `NCBI's OMIM
+<http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
+
+UCSC Genome Browser Method
+--------------------------
+
+There are many methods of retrieving DNA sequence, but for this
+example we will retrieve SMN1 through the UCSC genome broswer located
+at http://genome.ucsc.edu/.
+
+.. image:: images/ucsc_genome_browser_home.png
+   :alt: UCSC Genome Broswer
+   :align: center
+
+Step 1 - Find SMN1
+~~~~~~~~~~~~~~~~~~
+
+The first step in finding SMN1 is to use the **Gene Sorter** menu
+option which I have highlighted in orange below:
+
+.. image:: images/ucsc_menu_bar_gene_sorter.png
+   :alt: Gene Sorter Menu Option
+   :align: center
+
+Gene Sorter page:
+
+.. image:: images/ucsc_gene_sorter.png
+   :alt: Gene Sorter
+   :align: center
+
+We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
+
+.. image:: images/ucsc_gs_sort_name_sim.png
+   :alt: Gene Sorter - Name Similarity
+   :align: center
+
+After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
+
+.. image:: images/ucsc_gs_smn1.png
+   :alt: Gene
+   :align: center
+
+Press **Go!** and you should see the following page:
+
+.. image:: images/ucsc_gs_found.png
+   :alt: Found SMN1
+   :align: center
+
+Click on **SMN1** and you will be taking the gene expression atlas
+page.
+
+.. image:: images/ucsc_gs_genome_position.png
+   :alt: Gene expression atlas
+   :align: center
+
+Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
+position column**.
+
+Now we have found the location of SMN1 on human!
+
+.. image:: images/ucsc_gb_smn1_human.png
+   :alt: Genome Browser - SMN1 (human)
+   :align: center
+
+
+Step 2 - Download CDS/UTR sequence for annotations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Since we have found **SMN1**, this would be a convient time to extract
+the DNA sequence for the CDS and UTRs of the gene to use it as an
+annotation_ in Mussa.
+
+**Click on SMN1** shown **between** the **two orange arrows** shown
+below.
+
+.. image:: images/ucsc_gb_smn1_human_click_smn1.png
+   :alt: Genome Browser - SMN1 (human) - Orange Arrows
+   :align: center
+
+You should find yourself at the SMN1 description page.
+
+.. image:: images/ucsc_gb_smn1_description_page.png
+   :alt: Genome Browser - SMN1 (human) - Description page
+   :align: center
+
+**Scroll down** until you get to the **Sequence section** and click on
+**Genomic (chr5:70,256,524-70,284,592)**.
+
+.. image:: images/ucsc_gb_smn1_human_sequence.png
+   :alt: Genome Browser - SMN1 (human) - Sequence
+   :align: center
+
+You should now be at the **Genomic sequence near gene** page:
+
+.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
+   :alt: Genome Browser - SMN1 (human) - Get genomic sequence
+   :align: center
+
+Make the following changes (highlighted in orange in the screenshot
+below):
+
+ 1. UNcheck **introns**. 
+    (We only want to annotate CDS and UTRs.)
+ 2. Select **one fasta record** per **region**. 
+    (Mussa needs each CDS and UTR represented by one fasta record per CDS/UTR).
+ 3. Select **split UTR and CDS parts of an exon into separate FASTA records**.
+    (Breaks up **exons** into CDSs and UTRs.)
+
+.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
+   :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
+   :align: center
+
+Now click the **submit** button. You will then see a fasta file with
+many fasta records representing the CDS and UTRS.
+
+.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
+   :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
+   :align: center
+
+Now you need to save the fasta records to a **text file**. If you are
+using **Firefox** or **Internet Explorer 6+** click on the **File >
+Save As** menu option. 
+
+**IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
+repeat **NOT Webpage Complete** (see screenshot below.)
+
+Type in **smn1_human_annot.txt** for the file name.
+
+.. image:: images/smn1_human_annot.png
+   :alt: Genome Browser - SMN1 (human) - sequence annotation file
+   :align: center
+
+**IMPORTANT:** You should open the file with a text editor and make
+  sure **no html** was saved... If you find any html markup, delete
+  the markup and save the file.
+
+Now we are going to **modify the file** you just saved to **add the
+name of the species** to the **annotation file**. All you have to do
+is **add a new line** at the **top of the file** with the word **'Human'** as
+shown below:
+
+.. image:: images/smn1_human_annot_plus_human.png
+   :alt: Genome Browser - SMN1 (human) - sequence annotation file
+   :align: center
+
+You can add more annotations to this file if you wish. See the
+`annotation file format`_ section for details of the file format. By
+including fasta records in the annotation_ file, Mussa searches your
+DNA sequence for an exact match of the sequence in the annotation_
+file. If found, it will be marked as an annotation_ within Mussa.
+
+
+Step 3 - Download gene and upstream/downstream sequence
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Use the back button in your web browser to get back the **genome
+browser view** of **SMN1** as shown below.
+
+.. image:: images/ucsc_gb_smn1_human.png
+   :alt: Genome Browser - SMN1 (human)
+   :align: center
+
+There are two options for getting additional sequence around your
+gene. The more complex way is to zoom out so that you have the
+sequence you want being shown in the genome browser and then follow
+the directions for the following method.
+
+The second option, which we will choose, is to leave the genome
+browser zoomed exactly at the location of SMN1 and click on the
+**DNA** option on the menu bar (shown with orange arrows in the
+screenshot below.)
+
+.. image:: images/ucsc_gb_smn1_human_dna_option.png
+   :alt: Genome Browser - SMN1 (human) - DNA Option
+   :align: center
+
+Now in the **get dna in window** page, lets add an arbitrary amount of
+extra sequence on to each end of the gene, lets say 5000 base pairs.
+
+.. image:: images/ucsc_gb_smn1_human_get_dna.png
+   :alt: Genome Browser - SMN1 (human) - Get DNA 
+   :align: center
+
+Click the **get DNA** button.
+
+.. image:: images/ucsc_gb_smn1_human_dna.png
+   :alt: Genome Browser - SMN1 (human) - DNA 
+   :align: center
+
+Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
+did in step 2 with the annotation file.
+
+**IMPORTANT:** Make sure the file is saved as a text file and not an
+HTML file. Open the file with a text editor and remove any HTML markup
+you find.
+
+
+Step 4 - Same/similar/related gene other species.
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+What good is a multiple sequence alignment viewer without multiple
+sequences? Lets find a similar gene in a few more species.
+
+Use the back button on your web browser until you get the **genome
+broswer view** of **SMN1** as shown below.
+
+.. image:: images/ucsc_genome_browser_home.png
+   :alt: UCSC Genome Broswer
+   :align: center
+
+**Click on SMN1** shown **between** the **two orange arrows** shown
+below.
+
+.. image:: images/ucsc_gb_smn1_human_click_smn1.png
+   :alt: Genome Browser - SMN1 (human) - Orange Arrows
+   :align: center
+
+You should find yourself at the SMN1 description page.
+
+.. image:: images/ucsc_gb_smn1_description_page.png
+   :alt: Genome Browser - SMN1 (human) - Description page
+   :align: center
+
+**Scroll down** until you get to the **Sequence section** and click on
+**Protein (262 aa)**.
+
+.. image:: images/ucsc_gb_smn1_human_sequence.png
+   :alt: Genome Browser - SMN1 (human) - Sequence
+   :align: center
+
+Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
+> Copy** option from the menu.
+
+.. image:: images/smn1_human_protein.png
+   :alt: Genome Browser - SMN1 (human) - Protein
+   :align: center
+
+Press the back button on the web browser once and then scroll to the
+top of the page and click on the **BLAT** option on the menu bar
+(shown below with orange arrows).
+
+.. image:: images/ucsc_gb_smn1_human_blat.png
+   :alt: Genome Browser - SMN1 (human) - Blat
+   :align: center
+
+**Paste** in the **protein sequence** and **change** the **genome** to
+**mouse** as shown below and then click **submit**.
+
+.. image:: images/ucsc_gb_smn1_human_blat_paste.png
+   :alt: Genome Browser - SMN1 (human) - Blat paste protein
+   :align: center
+
+Notice that we have two hits, one of which looks pretty good at 89.9%
+match.
+
+.. image:: images/ucsc_gb_smn1_human_blat_hits.png
+   :alt: Genome Browser - SMN1 (human) - Blat hits
+   :align: center
+
+**Click** on the **brower** link next to the 89.9% match. Notice in
+the genome browser (shown below) that there is an annotated gene
+called SMN1 for mouse which matches the line called **your sequence
+from blat search**. This means we are fairly confidant we found the
+right location in the mouse genome. 
+
+.. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
+   :alt: Genome Browser - SMN1 (human) - Blat to browser
+   :align: center
+
+Follow steps 1 through 3 for mouse and then repeat step 4 with the
+human protein sequence to find **SMN1** in the following species (if
+you find a match):
+
+ 1. Rat
+ 2. Rabbit
+ 3. Dog
+ 4. Armadillo
+ 5. Elephant
+ 6. Opposum
+ 7. x_tropicalis
+
+Make sure to save the extended DNA sequence and annotation file for
+each one.
+
  Using Mussagl
  =============
  
@@ -117,7 +413,7 @@ Launch Mussagl... It should look similar to the screen shot below.
  Create/Load Analysis
  ----------------------
  
-Currently there are three ways to load a mussa experiment.
+Currently there are three ways to load a Mussa experiment.
  
   1. `Create a new analysis`_
   2. `Load a mussa parameter file`_ (.mupa)
@@ -138,7 +434,7 @@ demo we will use the example sequences that come with Mussagl.
  
  Instructions:
  
- 1. **Give the experiement a name**, for this demo, we'll use
+ 1. **Give the experiment a name**, for this demo, we'll use
      'demo_w30_t20'. Mussa will create a folder with this name to store
      the analysis files in once it has been run.
  
@@ -157,7 +453,9 @@ Instructions:
  Now click on the 'Browse' button next to the sequence input box and
  then select /examples/seq/human_mck_pro.fa file. Do the same in the
  next two sequence input boxes selecting mouse_mck_pro.fa and
-rabbit_mck_pro.fa as shown below.
+rabbit_mck_pro.fa as shown below. Note that you can create annotation
+files using the mussa `Annotation File Format`_ to add annotations to
+your sequence.
  
  .. image:: images/define_analysis_step2.png
     :alt: Choose sequences
@@ -183,7 +481,7 @@ If you prefer, you can define your Mussa analysis using the Mussa
  parameter file. See the `Parameter File Format`_ section for details
  on creating a .mupa file.
  
-Once you have a .mupa file created, load Mussgl and select the **File >
+Once you have a .mupa file created, load Mussagl and select the **File >
  Load Mussa Parameters** menu option. Select the .mupa file and click
  open. 
  
@@ -212,45 +510,348 @@ click open.
     :align: center
  
  
-Detailed Info
--------------
+Main Window
+-----------
+
+Overview
+~~~~~~~~
+.. Screen-shot with numbers showing features.
+
+.. image:: images/window_overview.png
+   :alt: Mussa Window
+   :align: center
+
+Legend:
+
+ 1. `DNA Sequence (Black bars)`_
+ 
+ 2. Annotation_
+
+ 3. Motif_
+
+ 4. `Conservation tracks`_
+
+ 5. `Motif Toggle`_
+
+ 6. `Zoom Factor`_ (Base pairs per pixel)
+
+ 7. `Dynamic Threshold`_
+
+ 8. `Sequence Information Bar`_
+
+ 9. `Sequence Scroll Bar`_
+
+
+DNA Sequence (black bars)
+~~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. image:: images/sequence_bar.png
+   :alt: Sequence Bar
+   :align: center
+
+Each of the black bars represents one of the loaded sequences, in this
+case the sequence around the gene 'MCK' in human, mouse, and rabbit.
+
+FIXME: Should I mention the repeats here?
+
+
+Annotation
+~~~~~~~~~~
+
+.. figure:: images/annotation.png
+   :alt: Annotation
+   :align: center
+   
+   Annotation shown in green on sequence bar.
+
+
+Annotations can be included on any of the sequences using the `Load a
+mussa parameter file`_ method of loading your sequences. You can
+define annotations by location or using an exact sub-sequence and you
+may also choose any color for display of the annotation; see the
+`Annotation File Format`_ section for details.
+
+Note: Currently there is no way to add annotations using the GUI (only
+via the .mupa file). We plan to add this feature in the future, but it
+likely will not make it into the first release.
+
+
+Motif
+~~~~~
+
+.. figure:: images/motif.png
+   :alt: Motif
+   :align: center
+
+   Motif shown in light blue on sequence bar.
+
+The only real difference between an annotation and motif in Mussagl is
+that you can define motifs from within the GUI. See the `Motifs`_
+section for more information.
+
+
+Conservation tracks
+~~~~~~~~~~~~~~~~~~~
+
+.. figure:: images/conservation_tracks.png
+   :alt: Conservation Tracks
+   :align: center
+   
+   Conservations tracks shown as red and blue lines between sequence
+   bars.
+
+The **red lines** between the sequence bars represent conservation
+between the sequences and **blue lines** represent **reverse
+complement** conservation. The amount of sequence conservation shown
+will depend on the relatedness of your sequences and the `dynamic
+threshold` you are using. Sequences with lots of repeats will cause
+major slow downs in calculating the matches.
+
+
+Motif Toggle
+~~~~~~~~~~~~
+
+.. image:: images/motif_toggle.png
+   :alt: Motif Toggle
+   :align: center
+
+Toggles motifs on and off. This will not turn on and off annotations.
+
+Note: As of the current build (#200), this feature hasn't been
+implemented.
+
+
+Zoom Factor
+~~~~~~~~~~~
+
+.. image:: images/zoom_factor.png
+   :alt: Zoom Factor
+   :align: center
+
+The zoom factor represents the number of base pairs represented per
+pixel. When you zoom in far enough the sequence will switch from
+seeing a black bar, representing the sequence, to the actual sequence
+(well, ASCII representation of sequence).
+
+
+Dynamic Threshold
+~~~~~~~~~~~~~~~~~
+
+.. image:: images/dynamic_threshold.png
+   :alt: Dynamic Threshold
+   :align: center
+
+You can dynamically change the threshold for how strong of match you
+consider the conservation to be with one of two options:
+
+ 1. Number of base pair matches out of window size.
+ 
+ 2. Percent base pair conservation.
+
+See the Threshold_ section for more information.
+
+
+Sequence Information Bar
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. image:: images/seq_info_bar.png
+   :alt: Sequence Information Bar
+   :align: center
+
+The sequence information bars can be found to the left and right sides
+of Mussagl. Next to each sequence you will find the following
+information:
+
+ 1. Species (If it has been defined)
+ 2. Total Size of Sequence
+ 3. Current base pair position
+
+
+Sequence Scroll Bar
+~~~~~~~~~~~~~~~~~~~
+
+.. image:: images/scroll_bar.png
+   :alt: Sequence Scroll Bar
+   :align: center
+
+The scroll bar allows you to scroll through the sequence which is
+useful when you have zoomed in using the `zoom factor`_.
+
+
+Annotations / Motifs
+--------------------
+
+Annotations
+~~~~~~~~~~~
+
+Currently annotations can be added to a sequence using the mussa
+`annotation file format`_ and can be loaded by selecting the
+annotation file when defining a new analysis (see `Create a new
+analysis`_ section) or by defining a .mupa file pointing to your
+annotation file (see `Load a mussa parameter file`_ section).
+
+Motifs
+~~~~~~
+
+Load Motifs from File
+*********************
+
+It is possible to load motifs from a file which was saved from a
+previous run or by defining your own motif file. See the `Motif File
+Format`_ section for details.
+
+To load a motif file, select **Load Motif List** item from the
+**File** menu and select a motif list file.
+
+.. image:: images/load_motif.png
+   :alt: Load Motif List
+   :align: center
+
+
+Save Motifs to File
+*******************
+
+Note: Currently not implemented
+
+
+Motif Dialog
+************
+
+Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
+Code`_ for defining a motif. To define a motif, select **View > Edit
+Motifs** menu item as shown below.
+
+.. image:: images/view_edit_motifs.png
+   :alt: "View > Edit Motifs" Menu
+   :align: center
+
+You will see a dialog box appear with a "set motifs" button and 10
+rows for defining motifs and the color that will be displayed on the
+sequence. By default all 10 motifs start off as with white as the
+color. In the image below, I changed the color from white to blue to
+make it easier to see.
+
+.. image:: images/motif_dialog_start.png
+   :alt: Motif Dialog
+   :align: center
+
+Now lets make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
+Code`_, type in **'ATSCT'** into the first box as shown below.
+
+.. image:: images/motif_dialog_enter_motif.png
+   :alt: Enter Motif
+   :align: center
+
+Now choose a color for your motif by clicking on the colored area to
+the left of the motif. In the image above, you would click on the blue
+square, but by default the squares will be white. Remember to choose a
+color that will show up well with a black bar as the background.
+
+.. image:: images/color_chooser.png
+   :alt: Color Chooser
+   :align: center
+
+Once you have selected the color for your motif, click on the 'set
+motifs' button. Notice that if Mussa finds matches to your motif will
+now show up in the main Mussagl window.
+
+Before Motif:
+
+.. image:: images/motif_dialog_bar_before.png
+   :alt: Sequence bar before motif
+   :align: center
+
+After Motif:
+
+.. image:: images/motif_dialog_bar_after.png
+   :alt: Sequence bar after motif
+   :align: center
+
+
+View Mussa Alignements
+----------------------
+
+Mussagl allows you to zoom in on Mussa alignments by selecting the set
+of alignment(s) of interest. To do this, move the mouse near the
+alignment you are interested in viewing and then **PRESS** and
+**HOLD** the **LEFT mouse button** and **drag the mouse** to the other
+side of the conservation track so that you see a bounding box
+overlaping the alienment(s) of interest and then **let go** of the
+*left mouse button*.
+
+In the example below, I started by left clicking on the area marked by
+a red dot (upper left corner of bounding box) and draging the mouse to
+the area marked by a blue dot (lower right corner of the bounding box)
+and letting go of the left mouse button.
+
+.. image:: images/select_sequence.png
+   :alt: Select Sequence
+   :align: center
+
+All of the lines which were not selected should be washed out as shown
+below:
+
+.. image:: images/washed_out.png
+   :alt: Tracks washed out
+   :align: center
+
+With a selection made, goto the **View** menu and select **View mussa alignment**.
+
+.. image:: images/view_mussa_alignment.png
+   :alt: View mussa alignment
+   :align: center
+
+You should see the alignment at the base-pair level as shown below.
+
+.. image:: images/mussa_alignment.png
+   :alt: Mussa alignment
+   :align: center
+
+
+
+
+Saving to an Image
+---------------------------------
+
+FIXME: Need to write this section
+
+
+Detailed Information
+--------------------
  
  Threshold
  ~~~~~~~~~
  
-The threshold of an analysis is in minimum number of base pair
-matches must be meet to in order to be kept as a match. Note that you
-can vary the threshold from within Mussagl. For example, if you
-choose a `window size`_ of **30** and a **threshold** of **20** the mussa
-nway transitive algorithm will store all matches that are 20 out of 30
-bp matches or better and pass it on to Mussagl. Mussagl will
-then allow you to dynamically choose a threshold from 10 to 30 base
-pairs. A threshold of 30 bps would only show 30 out of 30 bp
-matches. A threshold of 20 bps would show all matches of 20 out of 30
-bps or better. Choosing a threshold below 20 in this case won't have
-an effect [*]_ because the mussa algorithm didn't report and matches below
-this threshold.
-
-.. [*] In the future, Mussagl will automatically detect the minimum
-   threshold which was used when defining an analysis and not allow
-   you to select a threshold below the minimum. See `ticket #52
-   <http://woldlab.caltech.edu/cgi-bin/mussa/ticket/52>`_ for more
-   info.
+The threshold of an analysis is in minimum number of base pair matches
+must be meet to in order to be kept as a match. Note that you can vary
+the threshold from within Mussagl. For example, if you choose a
+`window size`_ of **30** and a **threshold** of **20** the mussa nway
+transitive algorithm will store all matches that are 20 out of 30 bp
+matches or better and pass it on to Mussagl. Mussagl will then allow
+you to dynamically choose a threshold from 20 to 30 base pairs. A
+threshold of 30 bps would only show 30 out of 30 bp matches. A
+threshold of 20 bps would show all matches of 20 out of 30 bps or
+better. If you would like to see results for matches lower than 20 out
+of 30, you will need to rerun the analysis with a lower threshold.
  
  Window Size
  ~~~~~~~~~~~
  
-The typical sizes people tend to choose are between 20 and 30. Feel
-free to analysis with this setting depending on your needs.
+The typical sizes people tend to choose are between 20 and 30. You
+will likely need to experiment with this setting depending on your
+needs and input sequence.
  
  
  Sequences
  ~~~~~~~~~
  
-Mussa reads in sequences which are formated in the fasta_
+Mussa reads in sequences which are formatted in the fasta_
  format. Mussa may take a long time to run (>10 minutes) if the total
  bp length near 280Kb. Once mussa has run once, you can reload
-previously run analyses.
+previously run analyzes.
+
+FIXME: We have learned more about how much sequence and how many to
+put in Mussagl, this information should be documented here.
  
  
  Mussa File Formats
@@ -265,7 +866,7 @@ Parameter File Format
  
  ::
  
-  # name of anaylsis directory and stem for associated files
+  # name of analysis directory and stem for associated files
    ANA_NAME <analysis_name>
    
    # if APPEND vars true, a _wXX and/or _tYY added to analysis name
@@ -292,7 +893,7 @@ Parameter File Format
    SEQUENCE <fasta_file_path>
    # ANNOTATION <annotation_file_path>
    
-  # analyses parameters: command line args -w -t will override these
+  # analyzes parameters: command line args -w -t will override these
    WINDOW <num>
    THRESHOLD <num>
  
@@ -305,7 +906,7 @@ Parameter File Format
     "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
     "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
     "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
-   to analyse" 
+   to analyze" 
     "SEQUENCE", "/fasta/filepath.fa", "N/A", "true", "Must define one
     sequence per SEQUENCE_NUM." 
     "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
@@ -322,7 +923,15 @@ Annotation File Format
  ~~~~~~~~~~~~~~~~~~~~~~
  
  The first line in the file is the sequence name. Each line there after
-is a **space** seperated annotation.
+is a **space** separated annotation. 
+
+New as of build 198:
+ 
+ * The annotation format now supports fasta sequences embedded in the
+   annotation file as shown in the format example below. Mussagl will
+   take this sequence and look for an exact match of this sequence in
+   your sequences. If a match is found, it will label it with the name 
+   of from the fasta header.
  
  Format:
  
@@ -333,6 +942,12 @@ Format:
    <start> <stop> <annotation_name> <annotation_type>
    <start> <stop> <annotation_name> <annotation_type>
    <start> <stop> <annotation_name> <annotation_type>
+  >Fasta Header
+  ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
+  ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
+  TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
+  ACGTACGGCAGTACGCGGTCAGA
+  <start> <stop> <annotation_name> <annotation_type>
    ...
  
  Example:
@@ -343,10 +958,12 @@ Example:
    251 500 Glorp Glorptype
    751 1000 Glorp Glorptype
    1251 1500 Glorp Glorptype
+  >My favorite DNA sequence
+  GATTACA
    1751 2000 Glorp Glorptype
  
  
-.. _motif:
+.. _motif_file_format:
  
  Motif File Format
  ~~~~~~~~~~~~~~~~~
@@ -354,9 +971,43 @@ Motif File Format
  Format:
  
    <motif> <red> <green> <blue>
+  
+Example:
+
    GGCC 0.0 1 1
  
  
+
+IUPAC Nucleotide Code
+~~~~~~~~~~~~~~~~~~~~~~
+
+For your convenience, below is a table of the IUPAC Nucleotide Code.
+
+The following table is table 1 from "Nomenclature for Incompletely
+Specified Bases in Nucleic Acid Sequences" which can be found at
+http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
+
+======  =================  ===================================
+Symbol Meaning            Origin of designation
+======  =================  ===================================
+G      G                  Guanine
+A      A                  Adenine
+T      T                  Thymine
+C      C                  Cytosine
+R      G or A             puRine
+Y      T or C             pYrimidine
+M      A or C             aMino
+K      G or T             Keto
+S      G or C             Strong interaction (3 H bonds)
+W      A or T             Weak interaction (2 H bonds)
+H      A or C or T        not-G, H follows G in the alphabet
+B      G or T or C        not-A, B follows A
+V      G or C or A        not-T (not-U), V follows U
+D      G or A or T        not-C, D follows C
+N      G or A or T or C   aNy
+======  =================  ===================================
+
+
  .. Define links below
     ------------------
  
@@ -364,4 +1015,4 @@ Format:
  .. _wiki: http://mussa.caltech.edu
  .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
  .. _fasta: http://en.wikipedia.org/wiki/FASTA_format
-
+.. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif
+\ No newline at end of file