Mussagl Manual: Updates for Docs 1.0

author Brandon King <kingb@caltech.edu>

Sat, 28 Oct 2006 00:15:33 +0000 (00:15 +0000)

committer Brandon King <kingb@caltech.edu>

Sat, 28 Oct 2006 00:15:33 +0000 (00:15 +0000)
author Brandon King <kingb@caltech.edu>
Sat, 28 Oct 2006 00:15:33 +0000 (00:15 +0000)
committer Brandon King <kingb@caltech.edu>
Sat, 28 Oct 2006 00:15:33 +0000 (00:15 +0000)
diff --git a/doc/manual/images/smn1_dir_structure.png b/doc/manual/images/smn1_dir_structure.png

new file mode 100644 (file)

index 0000000..18f6fdc

Binary files /dev/null and b/doc/manual/images/smn1_dir_structure.png differ
diff --git a/doc/manual/images/threshold_change.gif b/doc/manual/images/threshold_change.gif

new file mode 100644 (file)

index 0000000..2dba416

Binary files /dev/null and b/doc/manual/images/threshold_change.gif differ
diff --git a/doc/manual/mussagl_manual.rst b/doc/manual/mussagl_manual.rst

index 5e551e72f37260e67f29c15aee89cf32b9a67bd3..ed4642d9dccc273d2d3df9ea9d1f7b1bad2e0477 100644 (file)
--- a/doc/manual/mussagl_manual.rst
+++ b/doc/manual/mussagl_manual.rst
@@ -5,9 +5,9 @@ Mussagl Manual
  Brandon W. King
  ---------------
  
-Last updated: Oct 20th, 2006
+Last updated: Oct 27th, 2006
  
-Updated to Mussagl build: (In process to 424)
+Documentation for Mussagl v1.0
  
  
  .. Things to add
@@ -162,1101 +162,1250 @@ __ wiki_
  Obtaining Input Data
  ====================
  
-If you already have your data, you can skip ahead to the the `Using
+If you would like help obtaining data for use with Mussagl, you can
+skip ahead to the `Obtaining Input Data - Continued`_ section.
+
+If would like a tour of the software, continue with the `Using
  Mussagl`_ section.
  
-Let's say you have a gene of interest called 'SMN1' and you want to
-know how the sequence surrounding the gene in multiple species is
-conserved. Guess what, that's what we are going to do, retrieve the
-DNA sequence for SMN1 and prepare it for using in Mussa.
  
-For more information about SMN1 visit `NCBI's OMIM
-<http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
+Using Mussagl
+=============
  
-The SMN1 data retrieved in this section can be downloaded from the
-`Mussa Example Data
-<http://woldlab.caltech.edu/cgi-bin/mussa/wiki/ExampleData>`_ page if
-you prefer to skip this section of the manual.
  
+Launch Mussagl
+--------------
+Launch Mussagl... It should look similar to the screen shot below.
  
-UCSC Genome Browser Method
---------------------------
+.. image:: images/opened.png
+   :alt: Launch Mussa
+   :align: center
  
-There are many methods of retrieving DNA sequence, but for this
-example we will retrieve SMN1 through the UCSC genome browser located
-at http://genome.ucsc.edu/.
  
  
-.. image:: images/ucsc_genome_browser_home.png
-   :alt: UCSC Genome Browser
-   :align: center
+Create/Load Analysis
+----------------------
  
-Step 1 - Find SMN1
-~~~~~~~~~~~~~~~~~~
+Currently there are three ways to load a Mussa experiment.
  
-The first step in finding SMN1 is to use the **Gene Sorter** menu
-option which I have highlighted in orange below:
+ 1. `Create a new analysis`_
+ 2. `Load a mussa parameter file`_ (.mupa)
+ 3. `Load an analysis`_
  
-.. image:: images/ucsc_menu_bar_gene_sorter.png
-   :alt: Gene Sorter Menu Option
-   :align: center
+.. _createnew:
  
-Gene Sorter page:
+Create a new analysis
+~~~~~~~~~~~~~~~~~~~~~
  
-.. image:: images/ucsc_gene_sorter.png
-   :alt: Gene Sorter
+To create a new analysis select 'Define analysis' from the 'File'
+menu. You should see a dialog box similar to the one below. For this
+demo we will use the example sequences that come with Mussagl.
+
+.. image:: images/define_analysis.png
+   :alt: Define Analysis
     :align: center
  
-We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
+Instructions:
  
-.. image:: images/ucsc_gs_sort_name_sim.png
-   :alt: Gene Sorter - Name Similarity
-   :align: center
+ 1. **Give the experiment a name**, for this demo, we'll use
+    'demo_w30_t20'. Mussa will create a folder with this name to store
+    the analysis files in once it has been run.
  
-After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
+ 2. Choose a threshold_... for this demo **choose 20**. See the
+    Threshold_ section for more detailed information.
  
-.. image:: images/ucsc_gs_smn1.png
-   :alt: Gene
-   :align: center
+ 3. Choose a `window size`_. For this demo **choose 30**.
  
-Press **Go!** and you should see the following page:
  
-.. image:: images/ucsc_gs_found.png
-   :alt: Found SMN1
+ 4. Choose the number of sequences_ you would like. For this demo
+    **choose 3**.
+
+.. image:: images/define_analysis_step1a.png
+   :alt: Steps 1-4
     :align: center
  
-Click on **SMN1** and you will be taking the gene expression atlas
-page.
+First enter the species name of "Human" in the first "Species" text
+box. Now click on the 'Browse' button next to the sequence input box
+and then select /examples/seq/human_mck_pro.fa file. Do the same in
+the next two sequence input boxes selecting mouse_mck_pro.fa and
+rabbit_mck_pro.fa as shown below. Make sure to give them a species
+name as well. Note that you can create annotation files using the
+mussa `Annotation File Format`_ to add annotations to your sequence.
  
-.. image:: images/ucsc_gs_genome_position.png
-   :alt: Gene expression atlas
+.. image:: images/define_analysis_step2.png
+   :alt: Choose sequences
     :align: center
  
-Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
-position column**.
-
-Now we have found the location of SMN1 on human!
+Click the **create** button and in a few moments you should see
+something similar to the following screen shot.
  
-.. image:: images/ucsc_gb_smn1_human.png
-   :alt: Genome Browser - SMN1 (human)
+.. image:: images/demo.png
+   :alt: Mussagl Demo
     :align: center
  
+By default your analysis is NOT saved. If you try to close an analysis
+without saving, you will be prompted with a dialog box asking you if
+you would like to save your analysis. The `Saving`_ section for
+details on saving your analysis. When saving, choose directory and
+give the analysis the name **demo_w30_t20**. If you close and reopen
+Mussagl, you will then be able to load the saved analysis. See `Load
+an analysis`_ section below for details.
  
-Step 2 - Download CDS/UTR sequence for annotations
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-Since we have found **SMN1**, this would be a convenient time to extract
-the DNA sequence for the CDS and UTRs of the gene to use it as an
-annotation_ in Mussa.
+Load a mussa parameter file
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-**Click on SMN1** shown **between** the **two orange arrows** shown
-below.
+If you prefer, you can define your Mussa analysis using the Mussa
+parameter file. See the `Parameter File Format`_ section for details
+on creating a .mupa file.
  
-.. image:: images/ucsc_gb_smn1_human_click_smn1.png
-   :alt: Genome Browser - SMN1 (human) - Orange Arrows
+Once you have a .mupa file created, load Mussagl and select the **File >
+Create Analysis from File** menu option. Select the .mupa file and click
+open. 
+
+.. image:: images/load_mupa_menu.png
+   :alt: Load Mussa Parameters
     :align: center
  
-You should find yourself at the SMN1 description page.
+If you would like to see an example, you can load the
+**mck3test.mupa** file in the examples directory that comes with
+Mussagl.
  
-.. image:: images/ucsc_gb_smn1_description_page.png
-   :alt: Genome Browser - SMN1 (human) - Description page
+.. image:: images/load_mupa_dialog.png
+   :alt: Load Mussa Parameters Dialog
     :align: center
  
-**Scroll down** until you get to the **Sequence section** and click on
-**Genomic (chr5:70,256,524-70,284,592)**.
  
-.. image:: images/ucsc_gb_smn1_human_sequence.png
-   :alt: Genome Browser - SMN1 (human) - Sequence
-   :align: center
+Load an analysis
+~~~~~~~~~~~~~~~~
  
-You should now be at the **Genomic sequence near gene** page:
+To load a previously run analysis open Mussagl and select the **File >
+Open Existing Analysis** menu option. Select an analysis **directory** and
+click open.
  
-.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
-   :alt: Genome Browser - SMN1 (human) - Get genomic sequence
+.. image:: images/load_analysis_menu.png
+   :alt: Load Analysis Menu
     :align: center
  
-Make the following changes (highlighted in orange in the screenshot
-below):
-
- 1. UNcheck **introns**. 
-    (We only want to annotate CDS and UTRs.)
- 2. Select **one FASTA record** per **region**. 
-    (Mussa needs each CDS and UTR represented by one FASTA record per CDS/UTR).
- 3. Select **CDS in upper case, UTR in lower case.**
  
-.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
-   :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
-   :align: center
+Main Window
+-----------
  
-Now click the **submit** button. You will then see a FASTA file with
-many FASTA records representing the CDS and UTRS.
+Overview
+~~~~~~~~
+.. Screen-shot with numbers showing features.
  
-.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
-   :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
+.. image:: images/window_overview.png
+   :alt: Mussa Window
     :align: center
  
-Now you need to save the FASTA records to a **text file**. If you are
-using **Firefox** or **Internet Explorer 6+** click on the **File >
-Save As** menu option. 
+Legend:
  
-**IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
-repeat **NOT Webpage Complete** (see screenshot below.)
+ 1. `DNA Sequence (Black bars)`_
+ 
+ 2. Annotation_
  
-Type in **smn1_human_annot.txt** for the file name.
+ 3. Motif_
  
-.. image:: images/smn1_human_annot.png
-   :alt: Genome Browser - SMN1 (human) - sequence annotation file
-   :align: center
+ 4. `Red conservation tracks`_
  
-**IMPORTANT:** You should open the file with a text editor and make
-  sure **no HTML** was saved... If you find any HTML markup, delete
-  the markup and save the file.
+ 5. `Blue conservation tracks`_
  
-Now we are going to **modify the file** you just saved to **add the
-name of the species** to the **annotation file**. All you have to do
-is **add a new line** at the **top of the file** with the word **'Human'** as
-shown below:
+ 6. `Zoom Factor`_ (Base pairs per pixel)
  
-.. image:: images/smn1_human_annot_plus_human.png
-   :alt: Genome Browser - SMN1 (human) - sequence annotation file
-   :align: center
+ 7. `Dynamic Threshold`_
  
-You can add more annotations to this file if you wish. See the
-`annotation file format`_ section for details of the file format. By
-including FASTA records in the annotation_ file, Mussa searches your
-DNA sequence for an exact match of the sequence in the annotation_
-file. If found, it will be marked as an annotation_ within Mussa.
+ 8. `Sequence Information Bar`_
  
+ 9. `Sequence Scroll Bar`_
  
-Step 3 - Download gene and upstream/downstream sequence
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-Use the back button in your web browser to get back the **genome
-browser view** of **SMN1** as shown below.
+DNA Sequence (black bars)
+~~~~~~~~~~~~~~~~~~~~~~~~~
  
-.. image:: images/ucsc_gb_smn1_human.png
-   :alt: Genome Browser - SMN1 (human)
+.. image:: images/sequence_bar.png
+   :alt: Sequence Bar
     :align: center
  
-There are two options for getting additional sequence around your
-gene. The more complex way is to zoom out so that you have the
-sequence you want being shown in the genome browser and then follow
-the directions for the following method.
+Each of the black bars represents one of the loaded sequences, in this
+case the sequence around the gene 'MCK' in human, mouse, and rabbit.
  
-The second option, which we will choose, is to leave the genome
-browser zoomed exactly at the location of SMN1 and click on the
-**DNA** option on the menu bar (shown with orange arrows in the
-screenshot below.)
  
-.. image:: images/ucsc_gb_smn1_human_dna_option.png
-   :alt: Genome Browser - SMN1 (human) - DNA Option
+Annotation
+~~~~~~~~~~
+
+.. figure:: images/annotation.png
+   :alt: Annotation
     :align: center
+   
+   Annotation shown in green on sequence bar.
  
-Now in the **get dna in window** page, let's add an arbitrary amount of
-extra sequence on to each end of the gene, let's say 5000 base pairs.
  
-.. image:: images/ucsc_gb_smn1_human_get_dna.png
-   :alt: Genome Browser - SMN1 (human) - Get DNA 
-   :align: center
+Annotations can be included on any of the sequences using the `Load a
+mussa parameter file`_ or `Create a new analysis`_ method of loading
+your sequences. You can define annotations by location or using an
+exact sub-sequence or a FASTA sequence of the section of DNA you wish
+to annotate. See the `Annotation File Format`_ section for details.
  
-Click the **get DNA** button.
  
-.. image:: images/ucsc_gb_smn1_human_dna.png
-   :alt: Genome Browser - SMN1 (human) - DNA 
+Motif
+~~~~~
+
+.. figure:: images/motif.png
+   :alt: Motif
     :align: center
  
-Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
-did in step 2 with the annotation file.
+   Motif shown in light blue on sequence bar.
  
-**IMPORTANT:** Make sure the file is saved as a text file and not an
-HTML file. Open the file with a text editor and remove any HTML markup
-you find.
+The only real difference between an annotation and motif in Mussagl is
+that you can define motifs and choose a color from within the GUI. See
+the `Motifs`_ section for more information.
  
  
-Step 4 - Same/similar/related gene other species.
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+Red conservation tracks
+~~~~~~~~~~~~~~~~~~~~~~~
  
-What good is a multiple sequence alignment viewer without multiple
-sequences? Let'S find a similar gene in a few more species.
+.. figure:: images/conservation_tracks.png
+   :alt: Conservation Tracks
+   :align: center
+   
+   Conservations tracks shown as red and blue lines between sequence
+   bars.
  
-Use the back button on your web browser until you get the **genome
-browser view** of **SMN1** as shown below.
+The **red lines** between the sequence bars represent conservation
+between the sequences (i.e. not reverse complement matches)
  
-.. image:: images/ucsc_genome_browser_home.png
-   :alt: UCSC Genome Browser
-   :align: center
+The amount of sequence conservation shown will depend on how much your
+sequences are related and the `dynamic threshold`_ you are using.
  
-**Click on SMN1** shown **between** the **two orange arrows** shown
-below.
  
-.. image:: images/ucsc_gb_smn1_human_click_smn1.png
-   :alt: Genome Browser - SMN1 (human) - Orange Arrows
+Blue conservation tracks
+~~~~~~~~~~~~~~~~~~~~~~~~
+
+.. figure:: images/conservation_tracks.png
+   :alt: Conservation Tracks
     :align: center
+   
+   Conservations tracks shown as red and blue lines between sequence
+   bars.
  
-You should find yourself at the SMN1 description page.
+**Blue lines** represent **reverse complement** conservation relative
+to the sequence attached to the top of the blue line.
  
-.. image:: images/ucsc_gb_smn1_description_page.png
-   :alt: Genome Browser - SMN1 (human) - Description page
-   :align: center
+The amount of sequence conservation shown will depend on how much your
+sequences are related and the `dynamic threshold`_ you are using.
  
-**Scroll down** until you get to the **Sequence section** and click on
-**Protein (262 aa)**.
  
-.. image:: images/ucsc_gb_smn1_human_sequence.png
-   :alt: Genome Browser - SMN1 (human) - Sequence
+Zoom Factor
+~~~~~~~~~~~
+
+.. image:: images/zoom_factor.png
+   :alt: Zoom Factor
     :align: center
  
-Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
-> Copy** option from the menu.
+The zoom factor represents the number of base pairs represented per
+pixel. When you zoom in far enough the sequence will switch from
+seeing a black bar, representing the sequence, to the actual sequence
+(well, ASCII representation of sequence).
  
-.. image:: images/smn1_human_protein.png
-   :alt: Genome Browser - SMN1 (human) - Protein
-   :align: center
  
-Press the back button on the web browser once and then scroll to the
-top of the page and click on the **BLAT** option on the menu bar
-(shown below with orange arrows).
+Dynamic Threshold
+~~~~~~~~~~~~~~~~~
  
-.. image:: images/ucsc_gb_smn1_human_blat.png
-   :alt: Genome Browser - SMN1 (human) - Blat
+.. image:: images/dynamic_threshold.png
+   :alt: Dynamic Threshold
     :align: center
  
-**Paste** in the **protein sequence** and **change** the **genome** to
-**mouse** as shown below and then click **submit**.
+You can dynamically change the threshold for how strong a match you
+consider the conservation to be by changing the value in the dynamic
+threshold box. 
  
-.. image:: images/ucsc_gb_smn1_human_blat_paste.png
-   :alt: Genome Browser - SMN1 (human) - Blat paste protein
-   :align: center
+The value you enter is the minimum number of base pairs that have to
+be matched in order to be considered conserved. The second number that
+you can't change is the `window size`_ you used when creating the
+experiment. The last number is the percent match.
  
-Notice that we have two hits, one of which looks pretty good at 89.9%
-match.
+Below is an animation of the dynamic threshold being increased over
+time.
  
-.. image:: images/ucsc_gb_smn1_human_blat_hits.png
-   :alt: Genome Browser - SMN1 (human) - Blat hits
+.. image:: images/threshold_change.gif
+   :alt: Animated Dynamic Threshold
     :align: center
  
-**Click** on the **brower** link next to the 89.9% match. Notice in
-the genome browser (shown below) that there is an annotated gene
-called SMN1 for mouse which matches the line called **your sequence
-from blat search**. This means we are fairly confidant we found the
-right location in the mouse genome. 
+See the Threshold_ section for more information.
  
-.. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
-   :alt: Genome Browser - SMN1 (human) - Blat to browser
-   :align: center
  
-Follow steps 1 through 3 for mouse and then repeat step 4 with the
-human protein sequence to find **SMN1** in the following species (if
-you find a match):
+Sequence Information Bar
+~~~~~~~~~~~~~~~~~~~~~~~~
  
- 1. Rat
- 2. Rabbit
- 3. Dog
- 4. Armadillo
- 5. Elephant
- 6. Opposum
- 7. x_tropicalis
+.. image:: images/seq_info_bar.png
+   :alt: Sequence Information Bar
+   :align: center
  
-Make sure to save the extended DNA sequence and annotation file for
-each one.
+The sequence information bars can be found to the left and right sides
+of Mussagl. Next to each sequence you will find the following
+information:
  
-Using Mussagl
-=============
+ 1. Species (If it has been defined)
+ 2. Total Size of Sequence
+ 3. Current base pair position
  
+Note that you can **update the species** text box. Make sure to **save your
+experiment** after making this change by selecting **File > Save
+Analysis** from the menu.
  
-Launch Mussagl
---------------
-Launch Mussagl... It should look similar to the screen shot below.
+Sequence Scroll Bar
+~~~~~~~~~~~~~~~~~~~
  
-.. image:: images/opened.png
-   :alt: Launch Mussa
+.. image:: images/scroll_bar.png
+   :alt: Sequence Scroll Bar
     :align: center
  
+The scroll bar allows you to scroll through the sequence which is
+useful when you have zoomed in using the `zoom factor`_.
  
  
-Create/Load Analysis
-----------------------
-
-Currently there are three ways to load a Mussa experiment.
+Saving
+------
  
- 1. `Create a new analysis`_
- 2. `Load a mussa parameter file`_ (.mupa)
- 3. `Load an analysis`_
+Save on Close
+~~~~~~~~~~~~~
  
-.. _createnew:
+When ever you create a new analysis or make a change such as
+adding/editing a motif or changing a species name, an asterisk (*)
+will appear in the title of the window showing that there are changes
+that have not been saved. If you close a Mussa window without saving
+changes, Mussa will ask you if you would like to save the changes that
+have been made.
  
-Create a new analysis
-~~~~~~~~~~~~~~~~~~~~~
+Save Analysis
+~~~~~~~~~~~~~
  
-To create a new analysis select 'Define analysis' from the 'File'
-menu. You should see a dialog box similar to the one below. For this
-demo we will use the example sequences that come with Mussagl.
+After making changes, such as updating species names or adding/editing
+motifs, you can save these changes by selecting the **File > Save
+analysis** menu option or pressing **CTRL + S** (PC) or
+**Apple/Command Key + S** (on Mac).
  
-.. image:: images/define_analysis.png
-   :alt: Define Analysis
+.. image:: images/save_analysis.png
+   :alt: Save analysis
     :align: center
  
-Instructions:
+Save Analysis As
+~~~~~~~~~~~~~~~~
  
- 1. **Give the experiment a name**, for this demo, we'll use
-    'demo_w30_t20'. Mussa will create a folder with this name to store
-    the analysis files in once it has been run.
+To save a copy of your analysis to a new location, select the **File >
+Save analysis as** menu option and choose a new location and name for
+your analysis.
  
- 2. Choose a threshold_... for this demo **choose 20**. See the
-    Threshold_ section for more detailed information.
+.. image:: images/save_analysis_as.png
+   :alt: Save analysis
+   :align: center
  
- 3. Choose a `window size`_. For this demo **choose 30**.
+Save Motif List
+~~~~~~~~~~~~~~~
  
+See `Save Motifs to File`_ in the `Motifs`_ section.
  
- 4. Choose the number of sequences_ you would like. For this demo
-    **choose 3**.
  
-.. image:: images/define_analysis_step1a.png
-   :alt: Steps 1-4
-   :align: center
+Viewing Multiple Analyses
+-------------------------
  
-First enter the species name of "Human" in the first "Species" text
-box. Now click on the 'Browse' button next to the sequence input box
-and then select /examples/seq/human_mck_pro.fa file. Do the same in
-the next two sequence input boxes selecting mouse_mck_pro.fa and
-rabbit_mck_pro.fa as shown below. Make sure to give them a species
-name as well. Note that you can create annotation files using the
-mussa `Annotation File Format`_ to add annotations to your sequence.
+Some times it is useful to view more than one analysis at a time. To
+do accomplish this, Mussa allows you to open a new Mussa window by
+selecting the **File > New Mussa Window** menu option.
  
-.. image:: images/define_analysis_step2.png
-   :alt: Choose sequences
+.. image:: images/new_mussa_window_menu.png
+   :alt: New Mussa Window Menu Option
     :align: center
  
-Click the **create** button and in a few moments you should see
-something similar to the following screen shot.
+A new Mussa window will pop up.
  
-.. image:: images/demo.png
-   :alt: Mussagl Demo
+.. figure:: images/new_mussa_window.png
+   :alt: New Mussa Window
     :align: center
  
-By default your analysis is NOT saved. If you try to close an analysis
-without saving, you will be prompted with a dialog box asking you if
-you would like to save your analysis. The `Saving`_ section for
-details on saving your analysis. When saving, choose directory and
-give the analysis the name **demo_w30_t20**. If you close and reopen
-Mussagl, you will then be able to load the saved analysis. See `Load
-an analysis`_ section below for details.
+   A new Mussa window on the right, in which a second analysis has
+   been loaded.
  
+Now you can create or load an existing analysis, in this new window,
+as described in the `Create/Load Analysis`_ section. 
  
-Load a mussa parameter file
-~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+You can view as many analyses as you can fit on your screen or until
+you run out of available RAM. If you notice a rapid decrease in
+performance and hear lots of noise coming from your hard drive, you
+probably ran out of RAM and are now using virtual memory (i.e. much
+much slower). If this happens, you may need to avoid opening as many
+analyses at one time.
  
-If you prefer, you can define your Mussa analysis using the Mussa
-parameter file. See the `Parameter File Format`_ section for details
-on creating a .mupa file.
  
-Once you have a .mupa file created, load Mussagl and select the **File >
-Create Analysis from File** menu option. Select the .mupa file and click
-open. 
+Annotations / Motifs
+--------------------
  
-.. image:: images/load_mupa_menu.png
-   :alt: Load Mussa Parameters
-   :align: center
+Annotations
+~~~~~~~~~~~
  
-If you would like to see an example, you can load the
-**mck3test.mupa** file in the examples directory that comes with
-Mussagl.
+Currently annotations can be added to a sequence using the mussa
+`annotation file format`_ and can be loaded by selecting the
+annotation file when defining a new analysis (see `Create a new
+analysis`_ section) or by defining a .mupa file pointing to your
+annotation file (see `Load a mussa parameter file`_ section).
  
-.. image:: images/load_mupa_dialog.png
-   :alt: Load Mussa Parameters Dialog
+Motifs
+~~~~~~
+
+Load Motifs from File
+*********************
+
+It is possible to load motifs from a file which was saved from a
+previous run or by defining your own motif file. See the `Motif File
+Format`_ section for details.
+
+NOTE: Valid motif list file extensions are:
+  
+  * .mtl
+  * .txt
+
+To load a motif file, select **Load Motif List** item from the
+**File** menu and select a motif list file.
+
+.. image:: images/load_motif.png
+   :alt: Load Motif List
     :align: center
  
  
-Load an analysis
-~~~~~~~~~~~~~~~~
+Save Motifs to File
+*******************
  
-To load a previously run analysis open Mussagl and select the **File >
-Open Existing Analysis** menu option. Select an analysis **directory** and
-click open.
+Motifs from the `Motif Dialog`_ can be saved to file for use with
+other analyses. If you just want your motifs to be saved with your
+analysis, see the `save analysis`_ section for details.
  
-.. image:: images/load_analysis_menu.png
-   :alt: Load Analysis Menu
+To save a motif list, select **File > Save Motifs** menu option. By
+default, Mussa will append .mtl if you do not provide a file extension
+(valid file extensions: .mtl & .txt).
+
+.. image:: images/save_motifs.png
+   :alt: Save Motifs
     :align: center
  
  
-Main Window
------------
+Motif Dialog
+************
  
-Overview
-~~~~~~~~
-.. Screen-shot with numbers showing features.
+Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
+Code`_ for defining a motif. To define a motif, select **Edit > Edit
+Motifs** menu item as shown below.
  
-.. image:: images/window_overview.png
-   :alt: Mussa Window
+.. image:: images/view_edit_motifs.png
+   :alt: "View > Edit Motifs" Menu
     :align: center
  
-Legend:
+You will see a dialog box appear with a "apply" button in the bottom
+right and one rows for defining motifs and the color that will be
+displayed on the sequence. When you start adding your first motif, an
+additional row will be added. The check box in the first column
+defines whether the motif is displayed or not. The second column is
+the motif display color. The third column is for the name of your
+motif and finally, the fourth column is motif itself.
  
- 1. `DNA Sequence (Black bars)`_
- 
- 2. Annotation_
+.. image:: images/motif_dialog_start.png
+   :alt: Motif Dialog
+   :align: center
  
- 3. Motif_
+Now let's make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
+Code`_, type in **'ATSCT'** into the motif field and **'My Motif'** for
+the name in the name field as shown below. 
  
- 4. `Red conservation tracks`_
+Notice how a second row appeared when you started to add the first
+motif. Every time you add a new motif, a new row will appear allowing
+you to add as many motifs as you need.
  
- 5. `Blue conservation tracks`_
+.. image:: images/motif_dialog_enter_motif.png
+   :alt: Enter Motif
+   :align: center
  
- 6. `Zoom Factor`_ (Base pairs per pixel)
+Now choose a color for your motif by clicking on the colored area to
+the left of the name field. Remember to choose a color that will show
+up well with a black bar as the background. A good tool for picking a
+color is the `Colour Contrast Analyser
+<http://juicystudio.com/services/colourcontrast.php>`_ by
+`juicystudio.com <http://juicystudio.com/>`_.
  
- 7. `Dynamic Threshold`_
+.. image:: images/color_chooser.png
+   :alt: Color Chooser
+   :align: center
  
- 8. `Sequence Information Bar`_
+Once you have selected the color for your motif, click on the
+**'apply'** button. Notice that if Mussa finds matches to your motif
+will now show up in the main Mussa window.
  
- 9. `Sequence Scroll Bar`_
+Before Motif:
  
+.. image:: images/motif_dialog_bar_before.png
+   :alt: Sequence bar before motif
+   :align: center
  
-DNA Sequence (black bars)
-~~~~~~~~~~~~~~~~~~~~~~~~~
+After Motif:
  
-.. image:: images/sequence_bar.png
-   :alt: Sequence Bar
+.. image:: images/motif_dialog_bar_after.png
+   :alt: Sequence bar after motif
     :align: center
  
-Each of the black bars represents one of the loaded sequences, in this
-case the sequence around the gene 'MCK' in human, mouse, and rabbit.
+To save your motifs with your analysis, see the `save analysis`_
+section. To save your motifs to a file, see the `save motifs to file`_
+section.
  
+Deleting a Motif
+^^^^^^^^^^^^^^^^
  
-Annotation
-~~~~~~~~~~
+To delete a motif, remove all text from the name and sequence columns
+and close the motif editor.
  
-.. figure:: images/annotation.png
-   :alt: Annotation
+View Mussa Alignments
+---------------------
+
+Mussagl allows you to zoom in on Mussa alignments by selecting the set
+of alignment(s) of interest. To do this, move the mouse near the
+alignment you are interested in viewing and then **PRESS** and
+**HOLD** the **LEFT mouse button** and **drag the mouse** to the other
+side of the conservation track so that you see a bounding box
+overlaping the alienment(s) of interest and then **let go** of the
+*left mouse button*.
+
+In the example below, I started by left-clicking on the area marked by
+a red dot (upper left corner of bounding box) and dragging the mouse to
+the area marked by a blue dot (lower right corner of the bounding box)
+and letting go of the left mouse button.
+
+.. image:: images/select_sequence.png
+   :alt: Select Sequence
     :align: center
-   
-   Annotation shown in green on sequence bar.
  
+All of the lines which were not selected should be washed out as shown
+below:
  
-Annotations can be included on any of the sequences using the `Load a
-mussa parameter file`_ or `Create a new analysis`_ method of loading
-your sequences. You can define annotations by location or using an
-exact sub-sequence or a FASTA sequence of the section of DNA you wish
-to annotate. See the `Annotation File Format`_ section for details.
+.. image:: images/washed_out.png
+   :alt: Tracks washed out
+   :align: center
  
+With a selection made, goto the **View** menu and select **View mussa alignment**.
  
-Motif
-~~~~~
+.. image:: images/view_mussa_alignment.png
+   :alt: View mussa alignment
+   :align: center
  
-.. figure:: images/motif.png
-   :alt: Motif
+You should see the alignment at the base-pair level as shown below.
+
+.. image:: images/mussa_alignment.png
+   :alt: Mussa alignment
     :align: center
  
-   Motif shown in light blue on sequence bar.
  
-The only real difference between an annotation and motif in Mussagl is
-that you can define motifs and choose a color from within the GUI. See
-the `Motifs`_ section for more information.
+Sub-analysis
+------------
  
+To run a sub-analysis **highlight** a section of sequence and *right
+click* on it and select **Add to subanalysis**. To the same for the
+sequences shown in orange in the screenshot below. Note that you **are
+NOT limited** to selecting only one subsequence from the same
+sequence.
  
-Red conservation tracks
-~~~~~~~~~~~~~~~~~~~~~~~
+.. image:: images/subanalysis_select_seqs.png
+   :alt: Subanalysis sequence selection
+   :align: center
  
-.. figure:: images/conservation_tracks.png
-   :alt: Conservation Tracks
+Once you have added your sequences for subanalysis, choose a `window size`_ and `threshold`_ and click **Ok**.
+
+.. image:: images/subanalysis_dialog.png
+   :alt: Subanalysis Dialog
     :align: center
-   
-   Conservations tracks shown as red and blue lines between sequence
-   bars.
  
-The **red lines** between the sequence bars represent conservation
-between the sequences (i.e. not reverse complement matches)
+A new Mussa window will appear with the subanalysis of your sequences
+once it's done running. This may take a while if you selected large
+chunks of sequence with a loose threshold.
  
-The amount of sequence conservation shown will depend on how much your
-sequences are related and the `dynamic threshold`_ you are using.
+.. image:: images/subanalysis_done.png
+   :alt: Subalaysis complete
+   :align: center
  
  
-Blue conservation tracks
-~~~~~~~~~~~~~~~~~~~~~~~~
+Copying sequence to clipboard
+-----------------------------
  
-.. figure:: images/conservation_tracks.png
-   :alt: Conservation Tracks
+To copy a sequence to the clipboard, highlight a section of sequence,
+as shown in the screen shot below, and do one of the following:
+
+ * Select **Copy as FASTA** from the **Edit** menu.
+ * **Right-Click (Left-click + Apple/Command Key on Mac)** on the highlighted sequence and select **Copy as FASTA**.
+ * Press **Ctrl + C (on PC)** or **Apple/Command Key + C (on Mac)** on the keyboard.
+
+.. image:: images/copy_sequence.png
+   :alt: Copy sequence
     :align: center
-   
-   Conservations tracks shown as red and blue lines between sequence
-   bars.
  
-**Blue lines** represent **reverse complement** conservation relative
-to the sequence attached to the top of the blue line.
  
-The amount of sequence conservation shown will depend on how much your
-sequences are related and the `dynamic threshold`_ you are using.
+Saving to an Image
+---------------------------------
  
+To save your current mussa view to an image, select **File > Save to
+image...** as shown below.
  
-Zoom Factor
-~~~~~~~~~~~
+.. image:: images/save_to_image_menu.png
+   :alt: File > Save to image...
+   :align: center
  
-.. image:: images/zoom_factor.png
-   :alt: Zoom Factor
+You can define the width and the height of the image to save. By
+default it will use the same size of your current view. Since the
+Mussa view is implemented using vectors, if you choose a larger size
+then your current view, Mussa will redraw at the higher resolution
+when saving. In other words, you get higher quality images when saving
+at a higher resolution.
+
+If you check the "Lock aspect ratio" check box, which I have circled
+in red, then when you change one value, say width, the other, height,
+will update automatically to keep the same aspect ratio.
+
+.. image:: images/save_to_image_dialog.png
+   :alt: Save to image dialog
     :align: center
  
-The zoom factor represents the number of base pairs represented per
-pixel. When you zoom in far enough the sequence will switch from
-seeing a black bar, representing the sequence, to the actual sequence
-(well, ASCII representation of sequence).
+Click save and choose a location and filename for your file.
  
+The valid image formats are:
  
-Dynamic Threshold
-~~~~~~~~~~~~~~~~~
+  * .png (default if no extension specified.)
+  * .jpg
  
-.. image:: images/dynamic_threshold.png
-   :alt: Dynamic Threshold
-   :align: center
  
-You can dynamically change the threshold for how strong a match you
-consider the conservation to be by changing the value in the dynamic
-threshold box. 
+Detailed Information
+--------------------
  
-The value you enter is the minimum number of base pairs that have to
-be matched in order to be considered conserved. The second number that
-you can't change is the `window size`_ you used when creating the
-experiment. The last number is the percent match.
+Threshold
+~~~~~~~~~
  
-See the Threshold_ section for more information.
+The threshold of an analysis is in minimum number of base pair matches
+must be meet to in order to be kept as a match. Note that you can vary
+the threshold from within Mussagl. For example, if you choose a
+`window size`_ of **30** and a **threshold** of **20** the mussa nway
+transitive algorithm will store all matches that are 20 out of 30 bp
+matches or better and pass it on to Mussagl. Mussagl will then allow
+you to dynamically choose a threshold from 20 to 30 base pairs. A
+threshold of 30 bps would only show 30 out of 30 bp matches. A
+threshold of 20 bps would show all matches of 20 out of 30 bps or
+better. If you would like to see results for matches lower than 20 out
+of 30, you will need to rerun the analysis with a lower threshold.
  
+Window Size
+~~~~~~~~~~~
  
-Sequence Information Bar
-~~~~~~~~~~~~~~~~~~~~~~~~
+The typical sizes people tend to choose are between 20 and 30. You
+will likely need to experiment with this setting depending on your
+needs and input sequence.
  
-.. image:: images/seq_info_bar.png
-   :alt: Sequence Information Bar
-   :align: center
  
-The sequence information bars can be found to the left and right sides
-of Mussagl. Next to each sequence you will find the following
-information:
+Sequences
+~~~~~~~~~
  
- 1. Species (If it has been defined)
- 2. Total Size of Sequence
- 3. Current base pair position
+Mussa reads in sequences which are formatted in the FASTA_
+format. Mussa may take a long time to run (>10 minutes) if the total
+bp length near 280Kb. Once mussa has run once, you can reload
+previously run analyzes.
  
-Note that you can **update the species** text box. Make sure to **save your
-experiment** after making this change by selecting **File > Save
-Analysis** from the menu.
+FIXME: We have learned more about how much sequence and how many to
+put in Mussagl, this information should be documented here.
  
-Sequence Scroll Bar
-~~~~~~~~~~~~~~~~~~~
  
-.. image:: images/scroll_bar.png
-   :alt: Sequence Scroll Bar
-   :align: center
+Mussa File Formats
+------------------
  
-The scroll bar allows you to scroll through the sequence which is
-useful when you have zoomed in using the `zoom factor`_.
+.. _param:
  
+Parameter File Format
+~~~~~~~~~~~~~~~~~~~~~
  
-Saving
-------
+Note that for the comment character '#' to work, it must contain a
+space after it (i.e. '# ').
  
-Save on Close
-~~~~~~~~~~~~~
+**File Format (.mupa):**
  
-When ever you create a new analysis or make a change such as
-adding/editing a motif or changing a species name, an asterisk (*)
-will appear in the title of the window showing that there are changes
-that have not been saved. If you close a Mussa window without saving
-changes, Mussa will ask you if you would like to save the changes that
-have been made.
+::
  
-Save Analysis
-~~~~~~~~~~~~~
+  # name of analysis directory and stem for associated files
+  ANA_NAME <analysis_name>
+  
+  # if APPEND vars true, a _wXX and/or _tYY added to analysis name
+  # where XX = WINDOW and YY = THRESHOLD
+  # Highly recommeded with use of command line override of WINDOW or THRESHOLD
+  APPEND_WIN <true/false>
+  APPEND_THRES <true/false>
+  
+  # first sequence info
+  SEQUENCE <FASTA_file_path>
+  ANNOTATION <annotation_file_path>
+  SEQ_START <sequence_start>
+  
+  # the second sequence info
+  SEQUENCE <FASTA_file_path>
+  # ANNOTATION <annotation_file_path>
+  SEQ_START <sequence_start>
+  # SEQ_END <sequence_end>
  
-After making changes, such as updating species names or adding/editing
-motifs, you can save these changes by selecting the **File > Save
-analysis** menu option or pressing **CTRL + S** (PC) or
-**Apple/Command Key + S** (on Mac).
+  # third sequence info
+  SEQUENCE <FASTA_file_path>
+  # ANNOTATION <annotation_file_path>
+  
+  # analyzes parameters: command line args -w -t will override these
+  WINDOW <num>
+  THRESHOLD <num>
  
-.. image:: images/save_analysis.png
-   :alt: Save analysis
-   :align: center
+.. csv-table:: Parameter File Options:
+   :header: "Option Name", "Value", "Default", "Required", "Description"
+   :widths: 30 30 30 30 60
  
-Save Analysis As
-~~~~~~~~~~~~~~~~
+   "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
+   name of directory where analysis will be saved." 
+   "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
+   "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
+   "SEQUENCE", "/FASTA/filepath.fa", "N/A", "true", "Absolute/Relative file
+   path to sequence." 
+   "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
+   annotation file. See `annotation file format`_ section for more
+   information." 
+   "SEQ_START", "integer", "1", "false", "Optional index into FASTA file"
+   "SEQ_END", "integer", "1", "false", "Optional index into FASTA file"
+   "WINDOW", "integer", "N/A", "true", "`Window Size`_"
+   "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
  
-To save a copy of your analysis to a new location, select the **File >
-Save analysis as** menu option and choose a new location and name for
-your analysis.
+.. _annot:
  
-.. image:: images/save_analysis_as.png
-   :alt: Save analysis
-   :align: center
+Annotation File Format
+~~~~~~~~~~~~~~~~~~~~~~
  
-Save Motif List
-~~~~~~~~~~~~~~~
+The first line in the file is the sequence name. Each line there after
+is a **space** separated annotation. 
  
-See `Save Motifs to File`_ in the `Motifs`_ section.
+Update:
+ 
+ * The annotation format now supports FASTA sequences embedded in the
+   annotation file as shown in the format example below. Mussagl will
+   take this sequence and look for an exact match of this sequence in
+   your sequences. If a match is found, it will label it with the name 
+   of from the FASTA header.
  
+Format:
  
-Viewing Multiple Analyses
--------------------------
+::
+  
+  <species/sequence_name>
+  <start> <stop> <annotation_name> <annotation_type>
+  <start> <stop> <annotation_name> <annotation_type>
+  <start> <stop> <annotation_name> <annotation_type>
+  <start> <stop> <annotation_name> <annotation_type>
+  >FASTA Header
+  ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
+  ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
+  TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
+  ACGTACGGCAGTACGCGGTCAGA
+  <start> <stop> <annotation_name> <annotation_type>
+  ...
  
-Some times it is useful to view more than one analysis at a time. To
-do accomplish this, Mussa allows you to open a new Mussa window by
-selecting the **File > New Mussa Window** menu option.
+Example:
  
-.. image:: images/new_mussa_window_menu.png
-   :alt: New Mussa Window Menu Option
-   :align: center
+::
  
-A new Mussa window will pop up.
+  Mouse
+  251 500 Glorp Glorptype
+  751 1000 Glorp Glorptype
+  1251 1500 Glorp Glorptype
+  >My favorite DNA sequence
+  GATTACA
+  1751 2000 Glorp Glorptype
  
-.. figure:: images/new_mussa_window.png
-   :alt: New Mussa Window
-   :align: center
  
-   A new Mussa window on the right, in which a second analysis has
-   been loaded.
+.. _motif_file_format:
+
+Motif File Format
+~~~~~~~~~~~~~~~~~
+
+Format:
+
+  <motif> <red> <green> <blue>
+  
+Example:
+
+  GGCC 0.0 1 1
+
+
+
+IUPAC Nucleotide Code
+~~~~~~~~~~~~~~~~~~~~~~
+
+For your convenience, below is a table of the IUPAC Nucleotide Code.
+
+The following table is table 1 from "Nomenclature for Incompletely
+Specified Bases in Nucleic Acid Sequences" which can be found at
+http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
+
+======  =================  ===================================
+Symbol Meaning            Origin of designation
+======  =================  ===================================
+G      G                  Guanine
+A      A                  Adenine
+T      T                  Thymine
+C      C                  Cytosine
+R      G or A             puRine
+Y      T or C             pYrimidine
+M      A or C             aMino
+K      G or T             Keto
+S      G or C             Strong interaction (3 H bonds)
+W      A or T             Weak interaction (2 H bonds)
+H      A or C or T        not-G, H follows G in the alphabet
+B      G or T or C        not-A, B follows A
+V      G or C or A        not-T (not-U), V follows U
+D      G or A or T        not-C, D follows C
+N      G or A or T or C   aNy
+======  =================  ===================================
+
+
+Obtaining Input Data - Continued
+--------------------------------
+
+If you already have your data, may want to go to the `Using Mussagl`_
+section of the manual.
+
+Let's say you have a gene of interest called 'SMN1' and you want to
+know how the sequence surrounding the gene in multiple species is
+conserved. Guess what, that's what we are going to do, retrieve the
+DNA sequence for SMN1 and prepare it for using in Mussa.
+
+For more information about SMN1 visit `NCBI's OMIM
+<http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
+
+The SMN1 data retrieved in this section can be downloaded from the
+`Mussa Example Data
+<http://woldlab.caltech.edu/cgi-bin/mussa/wiki/ExampleData>`_ page if
+you prefer to skip this section of the manual.
+
+UCSC Genome Browser Method
+--------------------------
+
+There are many methods of retrieving DNA sequence, but for this
+example we will retrieve SMN1 through the UCSC genome browser located
+at http://genome.ucsc.edu/.
+
  
-Now you can create or load an existing analysis, in this new window,
-as described in the `Create/Load Analysis`_ section. 
+.. image:: images/ucsc_genome_browser_home.png
+   :alt: UCSC Genome Browser
+   :align: center
  
-You can view as many analyses as you can fit on your screen or until
-you run out of available RAM. If you notice a rapid decrease in
-performance and hear lots of noise coming from your hard drive, you
-probably ran out of RAM and are now using virtual memory (i.e. much
-much slower). If this happens, you may need to avoid opening as many
-analyses at one time.
+Step 1 - Find SMN1
+~~~~~~~~~~~~~~~~~~
  
+The first step in finding SMN1 is to use the **Gene Sorter** menu
+option which I have highlighted in orange below:
  
-Annotations / Motifs
---------------------
+.. image:: images/ucsc_menu_bar_gene_sorter.png
+   :alt: Gene Sorter Menu Option
+   :align: center
  
-Annotations
-~~~~~~~~~~~
+Gene Sorter page:
  
-Currently annotations can be added to a sequence using the mussa
-`annotation file format`_ and can be loaded by selecting the
-annotation file when defining a new analysis (see `Create a new
-analysis`_ section) or by defining a .mupa file pointing to your
-annotation file (see `Load a mussa parameter file`_ section).
+.. image:: images/ucsc_gene_sorter.png
+   :alt: Gene Sorter
+   :align: center
  
-Motifs
-~~~~~~
+We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
  
-Load Motifs from File
-*********************
+.. image:: images/ucsc_gs_sort_name_sim.png
+   :alt: Gene Sorter - Name Similarity
+   :align: center
  
-It is possible to load motifs from a file which was saved from a
-previous run or by defining your own motif file. See the `Motif File
-Format`_ section for details.
+After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
  
-NOTE: Valid motif list file extensions are:
-  
-  * .mtl
-  * .txt
+.. image:: images/ucsc_gs_smn1.png
+   :alt: Gene
+   :align: center
  
-To load a motif file, select **Load Motif List** item from the
-**File** menu and select a motif list file.
+Press **Go!** and you should see the following page:
  
-.. image:: images/load_motif.png
-   :alt: Load Motif List
+.. image:: images/ucsc_gs_found.png
+   :alt: Found SMN1
     :align: center
  
+Click on **SMN1** and you will be taking the gene expression atlas
+page.
  
-Save Motifs to File
-*******************
+.. image:: images/ucsc_gs_genome_position.png
+   :alt: Gene expression atlas
+   :align: center
  
-Motifs from the `Motif Dialog`_ can be saved to file for use with
-other analyses. If you just want your motifs to be saved with your
-analysis, see the `save analysis`_ section for details.
+Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
+position column**.
  
-To save a motif list, select **File > Save Motifs** menu option. By
-default, Mussa will append .mtl if you do not provide a file extension
-(valid file extensions: .mtl & .txt).
+Now we have found the location of SMN1 on human!
  
-.. image:: images/save_motifs.png
-   :alt: Save Motifs
+.. image:: images/ucsc_gb_smn1_human.png
+   :alt: Genome Browser - SMN1 (human)
     :align: center
  
  
-Motif Dialog
-************
+Step 2 - Download CDS/UTR sequence for annotations
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
-Code`_ for defining a motif. To define a motif, select **Edit > Edit
-Motifs** menu item as shown below.
+Since we have found **SMN1**, this would be a convenient time to extract
+the DNA sequence for the CDS and UTRs of the gene to use it as an
+annotation_ in Mussa.
  
-.. image:: images/view_edit_motifs.png
-   :alt: "View > Edit Motifs" Menu
+**Click on SMN1** shown **between** the **two orange arrows** shown
+below.
+
+.. image:: images/ucsc_gb_smn1_human_click_smn1.png
+   :alt: Genome Browser - SMN1 (human) - Orange Arrows
     :align: center
  
-You will see a dialog box appear with a "apply" button in the bottom
-right and one rows for defining motifs and the color that will be
-displayed on the sequence. When you start adding your first motif, an
-additional row will be added. The check box in the first column
-defines whether the motif is displayed or not. The second column is
-the motif display color. The third column is for the name of your
-motif and finally, the fourth column is motif itself.
+You should find yourself at the SMN1 description page.
  
-.. image:: images/motif_dialog_start.png
-   :alt: Motif Dialog
+.. image:: images/ucsc_gb_smn1_description_page.png
+   :alt: Genome Browser - SMN1 (human) - Description page
     :align: center
  
-Now let's make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
-Code`_, type in **'ATSCT'** into the motif field and **'My Motif'** for
-the name in the name field as shown below. 
-
-Notice how a second row appeared when you started to add the first
-motif. Every time you add a new motif, a new row will appear allowing
-you to add as many motifs as you need.
+**Scroll down** until you get to the **Sequence section** and click on
+**Genomic (chr5:70,256,524-70,284,592)**.
  
-.. image:: images/motif_dialog_enter_motif.png
-   :alt: Enter Motif
+.. image:: images/ucsc_gb_smn1_human_sequence.png
+   :alt: Genome Browser - SMN1 (human) - Sequence
     :align: center
  
-Now choose a color for your motif by clicking on the colored area to
-the left of the name field. Remember to choose a color that will show
-up well with a black bar as the background. A good tool for picking a
-color is the `Colour Contrast Analyser
-<http://juicystudio.com/services/colourcontrast.php>`_ by
-`juicystudio.com <http://juicystudio.com/>`_.
+You should now be at the **Genomic sequence near gene** page:
  
-.. image:: images/color_chooser.png
-   :alt: Color Chooser
+.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
+   :alt: Genome Browser - SMN1 (human) - Get genomic sequence
     :align: center
  
-Once you have selected the color for your motif, click on the
-**'apply'** button. Notice that if Mussa finds matches to your motif
-will now show up in the main Mussa window.
+Make the following changes (highlighted in orange in the screenshot
+below):
  
-Before Motif:
+ 1. UNcheck **introns**. 
+    (We only want to annotate CDS and UTRs.)
+ 2. Select **one FASTA record** per **region**. 
+    (Mussa needs each CDS and UTR represented by one FASTA record per CDS/UTR).
+ 3. Select **CDS in upper case, UTR in lower case.**
  
-.. image:: images/motif_dialog_bar_before.png
-   :alt: Sequence bar before motif
+.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
+   :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
     :align: center
  
-After Motif:
+Now click the **submit** button. You will then see a FASTA file with
+many FASTA records representing the CDS and UTRS.
  
-.. image:: images/motif_dialog_bar_after.png
-   :alt: Sequence bar after motif
+.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
+   :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
     :align: center
  
-To save your motifs with your analysis, see the `save analysis`_
-section. To save your motifs to a file, see the `save motifs to file`_
-section.
+Now you need to save the FASTA records to a **text file**. If you are
+using **Firefox** or **Internet Explorer 6+** click on the **File >
+Save As** menu option. 
  
-Deleting a Motif
-^^^^^^^^^^^^^^^^
+**IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
+repeat **NOT Webpage Complete** (see screenshot below.)
  
-To delete a motif, remove all text from the name and sequence columns
-and close the motif editor.
+Type in **smn1_human_annot.txt** for the file name.
  
-View Mussa Alignments
----------------------
+.. image:: images/smn1_human_annot.png
+   :alt: Genome Browser - SMN1 (human) - sequence annotation file
+   :align: center
  
-Mussagl allows you to zoom in on Mussa alignments by selecting the set
-of alignment(s) of interest. To do this, move the mouse near the
-alignment you are interested in viewing and then **PRESS** and
-**HOLD** the **LEFT mouse button** and **drag the mouse** to the other
-side of the conservation track so that you see a bounding box
-overlaping the alienment(s) of interest and then **let go** of the
-*left mouse button*.
+**IMPORTANT:** You should open the file with a text editor and make
+  sure **no HTML** was saved... If you find any HTML markup, delete
+  the markup and save the file.
  
-In the example below, I started by left-clicking on the area marked by
-a red dot (upper left corner of bounding box) and dragging the mouse to
-the area marked by a blue dot (lower right corner of the bounding box)
-and letting go of the left mouse button.
+Now we are going to **modify the file** you just saved to **add the
+name of the species** to the **annotation file**. All you have to do
+is **add a new line** at the **top of the file** with the word **'Human'** as
+shown below:
  
-.. image:: images/select_sequence.png
-   :alt: Select Sequence
+.. image:: images/smn1_human_annot_plus_human.png
+   :alt: Genome Browser - SMN1 (human) - sequence annotation file
     :align: center
  
-All of the lines which were not selected should be washed out as shown
-below:
+You can add more annotations to this file if you wish. See the
+`annotation file format`_ section for details of the file format. By
+including FASTA records in the annotation_ file, Mussa searches your
+DNA sequence for an exact match of the sequence in the annotation_
+file. If found, it will be marked as an annotation_ within Mussa.
  
-.. image:: images/washed_out.png
-   :alt: Tracks washed out
-   :align: center
  
-With a selection made, goto the **View** menu and select **View mussa alignment**.
+Step 3 - Download gene and upstream/downstream sequence
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
-.. image:: images/view_mussa_alignment.png
-   :alt: View mussa alignment
+Use the back button in your web browser to get back the **genome
+browser view** of **SMN1** as shown below.
+
+.. image:: images/ucsc_gb_smn1_human.png
+   :alt: Genome Browser - SMN1 (human)
     :align: center
  
-You should see the alignment at the base-pair level as shown below.
+There are two options for getting additional sequence around your
+gene. The more complex way is to zoom out so that you have the
+sequence you want being shown in the genome browser and then follow
+the directions for the following method.
  
-.. image:: images/mussa_alignment.png
-   :alt: Mussa alignment
+The second option, which we will choose, is to leave the genome
+browser zoomed exactly at the location of SMN1 and click on the
+**DNA** option on the menu bar (shown with orange arrows in the
+screenshot below.)
+
+.. image:: images/ucsc_gb_smn1_human_dna_option.png
+   :alt: Genome Browser - SMN1 (human) - DNA Option
     :align: center
  
+Now in the **get dna in window** page, let's add an arbitrary amount of
+extra sequence on to each end of the gene, let's say 5000 base pairs.
  
-Sub-analysis
-------------
+.. image:: images/ucsc_gb_smn1_human_get_dna.png
+   :alt: Genome Browser - SMN1 (human) - Get DNA 
+   :align: center
  
-To run a sub-analysis **highlight** a section of sequence and *right
-click* on it and select **Add to subanalysis**. To the same for the
-sequences shown in orange in the screenshot below. Note that you **are
-NOT limited** to selecting only one subsequence from the same
-sequence.
+Click the **get DNA** button.
  
-.. image:: images/subanalysis_select_seqs.png
-   :alt: Subanalysis sequence selection
+.. image:: images/ucsc_gb_smn1_human_dna.png
+   :alt: Genome Browser - SMN1 (human) - DNA 
     :align: center
  
-Once you have added your sequences for subanalysis, choose a `window size`_ and `threshold`_ and click **Ok**.
+Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
+did in step 2 with the annotation file.
  
-.. image:: images/subanalysis_dialog.png
-   :alt: Subanalysis Dialog
-   :align: center
+**IMPORTANT:** Make sure the file is saved as a text file and not an
+HTML file. Open the file with a text editor and remove any HTML markup
+you find.
  
-A new Mussa window will appear with the subanalysis of your sequences
-once it's done running. This may take a while if you selected large
-chunks of sequence with a loose threshold.
  
-.. image:: images/subanalysis_done.png
-   :alt: Subalaysis complete
-   :align: center
+Step 4 - Same/similar/related gene other species.
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
+What good is a multiple sequence alignment viewer without multiple
+sequences? Let'S find a similar gene in a few more species.
  
-Copying sequence to clipboard
------------------------------
+Use the back button on your web browser until you get the **genome
+browser view** of **SMN1** as shown below.
  
-To copy a sequence to the clipboard, highlight a section of sequence,
-as shown in the screen shot below, and do one of the following:
+.. image:: images/ucsc_genome_browser_home.png
+   :alt: UCSC Genome Browser
+   :align: center
  
- * Select **Copy as FASTA** from the **Edit** menu.
- * **Right-Click (Left-click + Apple/Command Key on Mac)** on the highlighted sequence and select **Copy as FASTA**.
- * Press **Ctrl + C (on PC)** or **Apple/Command Key + C (on Mac)** on the keyboard.
+**Click on SMN1** shown **between** the **two orange arrows** shown
+below.
  
-.. image:: images/copy_sequence.png
-   :alt: Copy sequence
+.. image:: images/ucsc_gb_smn1_human_click_smn1.png
+   :alt: Genome Browser - SMN1 (human) - Orange Arrows
     :align: center
  
+You should find yourself at the SMN1 description page.
  
-Saving to an Image
----------------------------------
+.. image:: images/ucsc_gb_smn1_description_page.png
+   :alt: Genome Browser - SMN1 (human) - Description page
+   :align: center
  
-To save your current mussa view to an image, select **File > Save to
-image...** as shown below.
+**Scroll down** until you get to the **Sequence section** and click on
+**Protein (262 aa)**.
  
-.. image:: images/save_to_image_menu.png
-   :alt: File > Save to image...
+.. image:: images/ucsc_gb_smn1_human_sequence.png
+   :alt: Genome Browser - SMN1 (human) - Sequence
     :align: center
  
-You can define the width and the height of the image to save. By
-default it will use the same size of your current view. Since the
-Mussa view is implemented using vectors, if you choose a larger size
-then your current view, Mussa will redraw at the higher resolution
-when saving. In other words, you get higher quality images when saving
-at a higher resolution.
+Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
+> Copy** option from the menu.
  
-If you check the "Lock aspect ratio" check box, which I have circled
-in red, then when you change one value, say width, the other, height,
-will update automatically to keep the same aspect ratio.
+.. image:: images/smn1_human_protein.png
+   :alt: Genome Browser - SMN1 (human) - Protein
+   :align: center
  
-.. image:: images/save_to_image_dialog.png
-   :alt: Save to image dialog
+Press the back button on the web browser once and then scroll to the
+top of the page and click on the **BLAT** option on the menu bar
+(shown below with orange arrows).
+
+.. image:: images/ucsc_gb_smn1_human_blat.png
+   :alt: Genome Browser - SMN1 (human) - Blat
     :align: center
  
-Click save and choose a location and filename for your file.
+**Paste** in the **protein sequence** and **change** the **genome** to
+**mouse** as shown below and then click **submit**.
  
-The valid image formats are:
+.. image:: images/ucsc_gb_smn1_human_blat_paste.png
+   :alt: Genome Browser - SMN1 (human) - Blat paste protein
+   :align: center
  
-  * .png (default if no extension specified.)
-  * .jpg
+Notice that we have two hits, one of which looks pretty good at 89.9%
+match.
  
+.. image:: images/ucsc_gb_smn1_human_blat_hits.png
+   :alt: Genome Browser - SMN1 (human) - Blat hits
+   :align: center
  
-Detailed Information
---------------------
+**Click** on the **brower** link next to the 89.9% match. Notice in
+the genome browser (shown below) that there is an annotated gene
+called SMN1 for mouse which matches the line called **your sequence
+from blat search**. This means we are fairly confidant we found the
+right location in the mouse genome. 
  
-Threshold
-~~~~~~~~~
+.. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
+   :alt: Genome Browser - SMN1 (human) - Blat to browser
+   :align: center
  
-The threshold of an analysis is in minimum number of base pair matches
-must be meet to in order to be kept as a match. Note that you can vary
-the threshold from within Mussagl. For example, if you choose a
-`window size`_ of **30** and a **threshold** of **20** the mussa nway
-transitive algorithm will store all matches that are 20 out of 30 bp
-matches or better and pass it on to Mussagl. Mussagl will then allow
-you to dynamically choose a threshold from 20 to 30 base pairs. A
-threshold of 30 bps would only show 30 out of 30 bp matches. A
-threshold of 20 bps would show all matches of 20 out of 30 bps or
-better. If you would like to see results for matches lower than 20 out
-of 30, you will need to rerun the analysis with a lower threshold.
+Follow steps 1 through 3 for mouse and then repeat step 4 with the
+human protein sequence to find **SMN1** in the following species (if
+you find a match):
  
-Window Size
-~~~~~~~~~~~
+ 1. Rat
+ 2. Rabbit
+ 3. Dog
+ 4. Armadillo
+ 5. Elephant
+ 6. Opposum
+ 7. x_tropicalis
  
-The typical sizes people tend to choose are between 20 and 30. You
-will likely need to experiment with this setting depending on your
-needs and input sequence.
+Make sure to save the extended DNA sequence and annotation file for
+each one.
  
  
-Sequences
-~~~~~~~~~
+Step 5 - Create Analysis
+~~~~~~~~~~~~~~~~~~~~~~~~
  
-Mussa reads in sequences which are formatted in the FASTA_
-format. Mussa may take a long time to run (>10 minutes) if the total
-bp length near 280Kb. Once mussa has run once, you can reload
-previously run analyzes.
+At this point you should have the annotations and fasta files for each
+species. If you skipped the first four steps or are having trouble,
+you can download the example data from the `Mussa Example Data
+<http://woldlab.caltech.edu/cgi-bin/mussa/wiki/ExampleData>`_ page.
  
-FIXME: We have learned more about how much sequence and how many to
-put in Mussagl, this information should be documented here.
+There are two methods for creating an analysis. You can create MUssa
+PArameter file (.mupa), or you can use the create analysis dialog. To
+use the analysis dialog, see the `create a new analysis`_ section.
  
+If you are planning on do lots of analyses using the same sets of DNA
+sequence but with different parameters, annotations, and/or species,
+it is often best to setup a `mupa`_ file, so you can:
  
-Mussa File Formats
-------------------
+  * Change parameters and rerun analysis easily.
+  * Use Mussa command line option to run a batch analyses.
+  * Define an analysis for someone else to run.
  
-.. _param:
+Now, we will create a `mupa`_ file for smn1 for an analysis with
+Human, Mouse, and Cow. I'll start by showing you the `mupa`_ file and
+then walking you through it line by line.
  
-Parameter File Format
-~~~~~~~~~~~~~~~~~~~~~
+Start by creating a new text file called *smn1_human_mouse_cow.mupa*,
+in your smn1 directory. I decided to put each of the fasta and
+annotation files for each species in it's own directory, so I will use
+that setup (see screen shot below).
  
-**File Format (.mupa):**
+.. image:: images/smn1_dir_structure.png
+   :alt: SMN1 directory structure
+   :align: center
  
+smn1_human_mouse_cow.mupa:
  ::
  
-  # name of analysis directory and stem for associated files
-  ANA_NAME <analysis_name>
-  
-  # if APPEND vars true, a _wXX and/or _tYY added to analysis name
-  # where XX = WINDOW and YY = THRESHOLD
-  # Highly recommeded with use of command line override of WINDOW or THRESHOLD
-  APPEND_WIN <true/false>
-  APPEND_THRES <true/false>
-  
-  # how many sequences are being analyzed
-  SEQUENCE_NUM <num>
-  
-  # first sequence info
-  SEQUENCE <FASTA_file_path>
-  ANNOTATION <annotation_file_path>
-  SEQ_START <sequence_start>
+  # Analysis name 
+  ANA_NAME smn1_human_mouse_cow
    
-  # the second sequence info
-  SEQUENCE <FASTA_file_path>
-  # ANNOTATION <annotation_file_path>
-  SEQ_START <sequence_start>
-  # SEQ_END <sequence_end>
-
-  # third sequence info
-  SEQUENCE <FASTA_file_path>
-  # ANNOTATION <annotation_file_path>
+  # Appending to analysis name
+  APPEND_WIN true
+  APPEND_THRES true
    
-  # analyzes parameters: command line args -w -t will override these
-  WINDOW <num>
-  THRESHOLD <num>
+  # Human sequence
+  SEQUENCE human/smn1_human_dna.fasta
+  ANNOTATION human/smn1_human_annotations.txt
  
-.. csv-table:: Parameter File Options:
-   :header: "Option Name", "Value", "Default", "Required", "Description"
-   :widths: 30 30 30 30 60
-
-   "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
-   name of directory where analysis will be saved." 
-   "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
-   "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
-   "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
-   to analyze" 
-   "SEQUENCE", "/FASTA/filepath.fa", "N/A", "true", "Must define one
-   sequence per SEQUENCE_NUM." 
-   "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
-   annotation file. See `annotation file format`_ section for more
-   information." 
-   "SEQ_START", "integer", "1", "false", "Optional index into FASTA file"
-   "SEQ_END", "integer", "1", "false", "Optional index into FASTA file"
-   "WINDOW", "integer", "N/A", "true", "`Window Size`_"
-   "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
-
-.. _annot:
-
-Annotation File Format
-~~~~~~~~~~~~~~~~~~~~~~
+  SEQUENCE mouse/smn1_mouse_dna.fasta
+  ANNOTATION mouse/smn1_mouse_annotations.txt
  
-The first line in the file is the sequence name. Each line there after
-is a **space** separated annotation. 
+  SEQUENCE cow/smn1_cow_dna.fasta
+  ANNOTATION cow/smn1_cow_annotations.txt
  
-New as of build 198:
- 
- * The annotation format now supports FASTA sequences embedded in the
-   annotation file as shown in the format example below. Mussagl will
-   take this sequence and look for an exact match of this sequence in
-   your sequences. If a match is found, it will label it with the name 
-   of from the FASTA header.
+  # Window size / Threshold
+  WINDOW 30
+  THRESHOLD 24
  
-Format:
+The first line is the analysis name. This will be the name of the
+directory the results will be saved in when using the Mussa `command
+line`_ option --no-gui to run an analysis. If you are using the Mussa
+GUI, then you will be prompted for a directory name as mentioned in
+the `saving`_ section.
  
  ::
    
-  <species/sequence_name>
-  <start> <stop> <annotation_name> <annotation_type>
-  <start> <stop> <annotation_name> <annotation_type>
-  <start> <stop> <annotation_name> <annotation_type>
-  <start> <stop> <annotation_name> <annotation_type>
-  >FASTA Header
-  ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
-  ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
-  TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
-  ACGTACGGCAGTACGCGGTCAGA
-  <start> <stop> <annotation_name> <annotation_type>
-  ...
+  # Analysis name 
+  ANA_NAME smn1_human_mouse_cow
  
-Example:
+If your provide the APPEND_WIN and/or APPEND_THRES, and set them to
+true, the window size and threshold will be appended to the analysis
+name. In this example, using the --no-gui `command line`_ option, our
+directory name would be *smn1_human_mouse_cow_w30_t24*.
  
  ::
  
-  Mouse
-  251 500 Glorp Glorptype
-  751 1000 Glorp Glorptype
-  1251 1500 Glorp Glorptype
-  >My favorite DNA sequence
-  GATTACA
-  1751 2000 Glorp Glorptype
-
-
-.. _motif_file_format:
+  # Appending to analysis name
+  APPEND_WIN true
+  APPEND_THRES true
  
-Motif File Format
-~~~~~~~~~~~~~~~~~
+The following six lines provide Mussa with the location of the
+sequence files and annotation files. The files can provided with
+relative paths from the .mupa file. In other words, this .mupa file
+will provide the proper path to the human sequence only if there
+exists a directory called *human* in the same directory as this .mupa
+file.
  
-Format:
+To provide the species name for each species, you have to put the
+species name in the annotation files. See the `annotation file
+format`_ section for more details.
  
-  <motif> <red> <green> <blue>
-  
-Example:
+::
  
-  GGCC 0.0 1 1
+  # Human sequence
+  SEQUENCE human/smn1_human_dna.fasta
+  ANNOTATION human/smn1_human_annotations.txt
  
+  SEQUENCE mouse/smn1_mouse_dna.fasta
+  ANNOTATION mouse/smn1_mouse_annotations.txt
  
+  SEQUENCE cow/smn1_cow_dna.fasta
+  ANNOTATION cow/smn1_cow_annotations.txt
  
-IUPAC Nucleotide Code
-~~~~~~~~~~~~~~~~~~~~~~
+And finally, the `window size`_ and `threshold`_ parameters.
  
-For your convenience, below is a table of the IUPAC Nucleotide Code.
+::
  
-The following table is table 1 from "Nomenclature for Incompletely
-Specified Bases in Nucleic Acid Sequences" which can be found at
-http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
+  # Window size / Threshold
+  WINDOW 30
+  THRESHOLD 24
  
-======  =================  ===================================
-Symbol Meaning            Origin of designation
-======  =================  ===================================
-G      G                  Guanine
-A      A                  Adenine
-T      T                  Thymine
-C      C                  Cytosine
-R      G or A             puRine
-Y      T or C             pYrimidine
-M      A or C             aMino
-K      G or T             Keto
-S      G or C             Strong interaction (3 H bonds)
-W      A or T             Weak interaction (2 H bonds)
-H      A or C or T        not-G, H follows G in the alphabet
-B      G or T or C        not-A, B follows A
-V      G or C or A        not-T (not-U), V follows U
-D      G or A or T        not-C, D follows C
-N      G or A or T or C   aNy
-======  =================  ===================================
+Next, open Mussagl and select the **File > Create Analysis from File**
+menu option. Mussagl should run your analysis if everything was setup
+properly.
  
  
  
  Understanding Mussa
  ===================
  
+Command Line
+------------
+
+Mussa has some very useful command line options that allow for
+loading an existing analysis or running a new analysis with or without
+launching the GUI.
+
+Mussa options:
+  --help                     help message
+  -p, --run-analysis arg     run an analysis defined by the mussa parameter file
+  --view-analysis arg        load a previously run analysis
+  --motifs arg               annotate analysis with motifs from this file
+  --no-gui                   terminate without running an analysis
+  --python                   launch as a `python interpreter`_
+
+Running an analysis using the --no-gui option is useful when you want
+to run many analyses on a compute server and save the results for
+viewing in the future.
+
  
  Performance
  -----------
@@ -1271,11 +1420,13 @@ FIXME: Include transitivity info.
  Repeats
  ~~~~~~~
  
-The algorithm Mussa uses to find conserved sequences is sensitive to
-repeated DNA segments, which are frequently occurring in most
-genomes. The problem with repeats, is that one repeat from one
-sequence can show up many times in another sequence. Every connection
-Mussa makes takes up memory and CPU time to process.
+Repeat masking of all input sequences, or at least of the "reference"
+genome, can be important for reducing compute time and for simplifying
+subsequent visual interpretation. Larger loci generally contain more
+repeat elements, and as their number grows so will the number of Mussa
+connections among them. If not repeat filtered, connectivity between
+shared repeat elements can obscure important relationships between
+single copy features.
  
  The formula for the number of connections, C, that will be made for R
  instances of a single repeat (meaning R copies of one repeat in each
@@ -1317,9 +1468,13 @@ you a C of 2500, ends up with a C^2 of 6,250,000.
  
  **Conclusion: repeats cause the processing time of Mussa to skyrocket.**
  
-One way to deal with a situation where you have many repeats in your
-sequences is do any of the following: user shorter sequence lengths;
-repeat mask one or more of your sequences; or increase the threshold.
+To deal with a situation where you have many repeats in your sequences
+do any of the following: 
+ 
+ * Use shorter sequence lengths.
+ * Repeat mask one or more of your sequences.
+ * Increase the threshold.
+
  
  Details
  -------
@@ -1327,6 +1482,18 @@ Details
  Case: Conservation track suddenly stops
  ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  
+Details about this potentially confusing case can be found `here
+<http://woldlab.caltech.edu/cgi-bin/mussa/wiki/OverlappingWindows>`_.
+
+Python Interpreter
+------------------
+
+Mussagl has some functionality for running a python interpreter for
+interacting with the internals of Mussagl and/or executing Python
+code. This feature is mostly experimental at this point in time. If
+you have interest in this feature or would like to know more about it,
+contact us using the contact information found at
+http://mussa.caltech.edu/.
  
  .. Define links below
     ------------------
@@ -1335,4 +1502,5 @@ Case: Conservation track suddenly stops
  .. _wiki: http://mussa.caltech.edu
  .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
  .. _FASTA: http://en.wikipedia.org/wiki/fasta_format
-.. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif
-\ No newline at end of file
+.. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif
+.. _mupa: `Parameter File Format`_
+\ No newline at end of file
author	Brandon King <kingb@caltech.edu>
	Sat, 28 Oct 2006 00:15:33 +0000 (00:15 +0000)
committer	Brandon King <kingb@caltech.edu>
	Sat, 28 Oct 2006 00:15:33 +0000 (00:15 +0000)
doc/manual/images/smn1_dir_structure.png	[new file with mode: 0644]	patch \| blob
doc/manual/images/threshold_change.gif	[new file with mode: 0644]	patch \| blob
doc/manual/mussagl_manual.rst		patch \| blob \| history