Brandon W. King
---------------
-Last updated: May 23th, 2006
+Last updated: Sept 20th, 2006
+
+Updated to Mussagl build: 287 (In process to 419)
+
+
+.. Things to add
+ * New features / change log
+ * Comment out anything isn't implemented yet.
+ * (DONE) List of features that will be implemented in the future.
+ * Look into the homology mapping of UCSC.
+ * Add toggle to genomes.
+ * Document why one fast record per region.
+ * How to deal with the hazards of small utrs vis motif finder. (Add warning)
+ * Add warning about saving FASTA file.
+ * Add a general principles section near the top
+ * Using comparison algorithm which will pickup all repeats
+ * Add info about repeatmasking
+ * Checking upstream and downstream genes for make sure you are in the right regions.
+ * Later on: look into Ensembl
+ * Look into method of homology instead of blating.
+ * Mention advantages of using mupa.
+ * Mention the difference between using arrows and scroll bar
+ * Document the color for motifs
+ * Update for Mac user left-click
+
+ * Wormbase/Flybase/mirBASE tutorials
-Updated to Mussagl build: 200 (Update to 230 in progress)
.. contents::
+Status
+======
+
+Major New Features
+------------------
+
+ * Build 381
+ * Analysis "Save As" feature
+
+Change Log
+----------
+
+.. INSERT CHANGE LOG HERE
+.. END INSERT CHANGE LOG
+
+Features to be Implemented
+--------------------------
+
+ * Motif editor supporting more than 10 motifs
+ (Status: http://woldlab.caltech.edu/cgi-bin/mussa/ticket/122)
+ * Save motifs from Mussagl
+ (Status: http://woldlab.caltech.edu/cgi-bin/mussa/ticket/133)
+
+For an up-to-date list of features to be implemented visit:
+http://woldlab.caltech.edu/cgi-bin/mussa/roadmap
+
Introduction
============
What is Mussagl?
----------------
+Mussa is an N-way version of the FamilyRelations (which is a part of
+the Cartwheel project) 2-way comparative sequence analysis
+software. Given DNA sequence from N species, Mussa uses all possible
+pairwise comparions to derive an N-wise comparison. For example, given
+sequences 1,2,3, and 4, Mussa makes 6 2-way comparisons: 1vs2, 1vs3,
+1vs4, 2vs3, 2vs4, and 3vs4. It then compares all the links between
+these comparisons, saving those that satisfy a transitivity
+requirement. The saved paths are then displayed in an interactive
+viewer.
Short History of Mussa
----------------------
-
Mussa Python/PMW Prototype
~~~~~~~~~~~~~~~~~~~~~~~~~~
+First Python/PMW based protoype.
Mussa C++/FLTK
~~~~~~~~~~~~~~
+A rewrite for speed purposes using C++ and FLTK GUI toolkit.
Mussagl C++/Qt/OpenGL
~~~~~~~~~~~~~~~~~~~~~
+Refactored version using the more elegant Qt GUI framework and
+OpenGL for hardware acceleration for those who have better graphics
+cards.
Getting Mussagl
===============
If you already have your data, you can skip ahead to the the `Using
Mussagl`_ section.
-Lets say you have a gene of interest called 'SMN1' and you want to
+Let's say you have a gene of interest called 'SMN1' and you want to
know how the sequence surrounding the gene in multiple species is
conserved. Guess what, that's what we are going to do, retrieve the
DNA sequence for SMN1 and prepare it for using in Mussa.
--------------------------
There are many methods of retrieving DNA sequence, but for this
-example we will retrieve SMN1 through the UCSC genome broswer located
+example we will retrieve SMN1 through the UCSC genome browser located
at http://genome.ucsc.edu/.
.. image:: images/ucsc_genome_browser_home.png
- :alt: UCSC Genome Broswer
+ :alt: UCSC Genome Browser
:align: center
Step 1 - Find SMN1
Step 2 - Download CDS/UTR sequence for annotations
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Since we have found **SMN1**, this would be a convient time to extract
+Since we have found **SMN1**, this would be a convenient time to extract
the DNA sequence for the CDS and UTRs of the gene to use it as an
annotation_ in Mussa.
1. UNcheck **introns**.
(We only want to annotate CDS and UTRs.)
- 2. Select **one fasta record** per **region**.
- (Mussa needs each CDS and UTR represented by one fasta record per CDS/UTR).
- 3. Select **split UTR and CDS parts of an exon into separate FASTA records**.
- (Breaks up **exons** into CDSs and UTRs.)
+ 2. Select **one FASTA record** per **region**.
+ (Mussa needs each CDS and UTR represented by one FASTA record per CDS/UTR).
+ 3. Select **CDS in upper case, UTR in lower case.**
.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
:alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
:align: center
-Now click the **submit** button. You will then see a fasta file with
-many fasta records representing the CDS and UTRS.
+Now click the **submit** button. You will then see a FASTA file with
+many FASTA records representing the CDS and UTRS.
.. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
:alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
:align: center
-Now you need to save the fasta records to a **text file**. If you are
+Now you need to save the FASTA records to a **text file**. If you are
using **Firefox** or **Internet Explorer 6+** click on the **File >
Save As** menu option.
:align: center
**IMPORTANT:** You should open the file with a text editor and make
- sure **no html** was saved... If you find any html markup, delete
+ sure **no HTML** was saved... If you find any HTML markup, delete
the markup and save the file.
Now we are going to **modify the file** you just saved to **add the
You can add more annotations to this file if you wish. See the
`annotation file format`_ section for details of the file format. By
-including fasta records in the annotation_ file, Mussa searches your
+including FASTA records in the annotation_ file, Mussa searches your
DNA sequence for an exact match of the sequence in the annotation_
file. If found, it will be marked as an annotation_ within Mussa.
:alt: Genome Browser - SMN1 (human) - DNA Option
:align: center
-Now in the **get dna in window** page, lets add an arbitrary amount of
-extra sequence on to each end of the gene, lets say 5000 base pairs.
+Now in the **get dna in window** page, let's add an arbitrary amount of
+extra sequence on to each end of the gene, let's say 5000 base pairs.
.. image:: images/ucsc_gb_smn1_human_get_dna.png
:alt: Genome Browser - SMN1 (human) - Get DNA
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
What good is a multiple sequence alignment viewer without multiple
-sequences? Lets find a similar gene in a few more species.
+sequences? Let'S find a similar gene in a few more species.
Use the back button on your web browser until you get the **genome
-broswer view** of **SMN1** as shown below.
+browser view** of **SMN1** as shown below.
.. image:: images/ucsc_genome_browser_home.png
- :alt: UCSC Genome Broswer
+ :alt: UCSC Genome Browser
:align: center
**Click on SMN1** shown **between** the **two orange arrows** shown
:alt: Dynamic Threshold
:align: center
-You can dynamically change the threshold for how strong of match you
+You can dynamically change the threshold for how strong a match you
consider the conservation to be with one of two options:
1. Number of base pair matches out of window size.
previous run or by defining your own motif file. See the `Motif File
Format`_ section for details.
+NOTE: Valid motif list file extensions are:
+
+ * .mtl
+ * .txt
+
To load a motif file, select **Load Motif List** item from the
**File** menu and select a motif list file.
Motif Dialog
************
+**New Features:**
+
+Build 276
+ * Allow for toggling individual motifs on and off.
+
+Build 269
+ * Field added for naming motifs.
+
Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
-Code`_ for defining a motif. To define a motif, select **View > Edit
+Code`_ for defining a motif. To define a motif, select **Edit > Edit
Motifs** menu item as shown below.
.. image:: images/view_edit_motifs.png
rows for defining motifs and the color that will be displayed on the
sequence. By default all 10 motifs start off as with white as the
color. In the image below, I changed the color from white to blue to
-make it easier to see.
+make it easier to see. The first text box is for the motif and the
+second box is for the name of the motif. The check box defines whether
+the motif is displayed or not.
.. image:: images/motif_dialog_start.png
:alt: Motif Dialog
:align: center
-Now lets make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
-Code`_, type in **'ATSCT'** into the first box as shown below.
+Now let's make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
+Code`_, type in **'ATSCT'** into the first box and 'My Motif' for the
+name in the second box as shown below.
.. image:: images/motif_dialog_enter_motif.png
:alt: Enter Motif
:align: center
-View Mussa Alignements
-----------------------
+View Mussa Alignments
+---------------------
Mussagl allows you to zoom in on Mussa alignments by selecting the set
of alignment(s) of interest. To do this, move the mouse near the
overlaping the alienment(s) of interest and then **let go** of the
*left mouse button*.
-In the example below, I started by left clicking on the area marked by
-a red dot (upper left corner of bounding box) and draging the mouse to
+In the example below, I started by left-clicking on the area marked by
+a red dot (upper left corner of bounding box) and dragging the mouse to
the area marked by a blue dot (lower right corner of the bounding box)
and letting go of the left mouse button.
:align: center
+Sub-analysis
+------------
+
+To run a sub-analysis **highlight** a section of sequence and *right
+click* on it and select **Add to subanalysis**. To the same for the
+sequences shown in orange in the screenshot below. Note that you **are
+NOT limited** to selecting more than one subsequence from the same
+sequence.
+
+.. image:: images/subanalysis_select_seqs.png
+ :alt: Subanalysis sequence selection
+ :align: center
+
+Once you have added your sequences for subanalysis, choose a `window size`_ and `threshold`_ and click **Ok**.
+
+.. image:: images/subanalysis_dialog.png
+ :alt: Subanalysis Dialog
+ :align: center
+
+A new Mussa window will appear with the subanalysis of your sequences
+once it's done running. This may take a while if you selected large
+chunks of sequence with a loose threshold.
+
+.. image:: images/subanalysis_done.png
+ :alt: Subalaysis complete
+ :align: center
+
+
+Copying sequence to clipboard
+-----------------------------
+
+To copy a sequence to the clipboard, highlight a section of sequence,
+as shown in the screen shot below, and do one of the following:
+ * Select **Copy as FASTA** from the **Edit** menu.
+ * **Right-Click (Left-click + Apple/Command Key on Mac)** on the highlighted sequence and select **Copy as FASTA**.
+ * Press **Ctrl + C (on PC)** or **Apple/Command Key + C (on Mac)** on the keyboard.
+
+.. image:: images/copy_sequence.png
+ :alt: Copy sequence
+ :align: center
Saving to an Image
---------------------------------
-FIXME: Need to write this section
+ * Updated to build 419.
+
+To save your current mussa view to an image, select **File > Save to
+image...** as shown below.
+
+.. image:: images/save_to_image_menu.png
+ :alt: File > Save to image...
+ :align: center
+
+You can define the width and the height of the image to save. By
+default it will use the same size of your current view. Since the
+Mussa view is implemented using vectors, if you choose a larger size
+then your current view, Mussa will redraw at the higher resolution
+when saving. In other words, you get higher quality images when saving
+at a higher resolution.
+
+If you check the "Lock aspect ratio" check box, which I have circled
+in red, then when you change one value, say width, the other, height,
+will update automatically to keep the same aspect ratio.
+
+.. image:: images/save_to_image_dialog.png
+ :alt: Save to image dialog
+ :align: center
+
+Click save and choose a location and filename for your file.
+
+The valid image formats are:
+
+ * .png (default if no extension specified.)
+ * .jpg
Detailed Information
Sequences
~~~~~~~~~
-Mussa reads in sequences which are formatted in the fasta_
+Mussa reads in sequences which are formatted in the FASTA_
format. Mussa may take a long time to run (>10 minutes) if the total
bp length near 280Kb. Once mussa has run once, you can reload
previously run analyzes.
SEQUENCE_NUM <num>
# first sequence info
- SEQUENCE <fasta_file_path>
+ SEQUENCE <FASTA_file_path>
ANNOTATION <annotation_file_path>
SEQ_START <sequence_start>
# the second sequence info
- SEQUENCE <fasta_file_path>
+ SEQUENCE <FASTA_file_path>
# ANNOTATION <annotation_file_path>
SEQ_START <sequence_start>
# SEQ_END <sequence_end>
# third sequence info
- SEQUENCE <fasta_file_path>
+ SEQUENCE <FASTA_file_path>
# ANNOTATION <annotation_file_path>
# analyzes parameters: command line args -w -t will override these
"APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
"SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
to analyze"
- "SEQUENCE", "/fasta/filepath.fa", "N/A", "true", "Must define one
+ "SEQUENCE", "/FASTA/filepath.fa", "N/A", "true", "Must define one
sequence per SEQUENCE_NUM."
"ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
annotation file. See `annotation file format`_ section for more
information."
- "SEQ_START", "integer", "1", "false", "Optional index into fasta file"
- "SEQ_END", "integer", "1", "false", "Optional index into fasta file"
+ "SEQ_START", "integer", "1", "false", "Optional index into FASTA file"
+ "SEQ_END", "integer", "1", "false", "Optional index into FASTA file"
"WINDOW", "integer", "N/A", "true", "`Window Size`_"
"THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
New as of build 198:
- * The annotation format now supports fasta sequences embedded in the
+ * The annotation format now supports FASTA sequences embedded in the
annotation file as shown in the format example below. Mussagl will
take this sequence and look for an exact match of this sequence in
your sequences. If a match is found, it will label it with the name
- of from the fasta header.
+ of from the FASTA header.
Format:
<start> <stop> <annotation_name> <annotation_type>
<start> <stop> <annotation_name> <annotation_type>
<start> <stop> <annotation_name> <annotation_type>
- >Fasta Header
+ >FASTA Header
ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
.. _GPL: http://www.opensource.org/licenses/gpl-license.php
.. _wiki: http://mussa.caltech.edu
.. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
-.. _fasta: http://en.wikipedia.org/wiki/FASTA_format
+.. _FASTA: http://en.wikipedia.org/wiki/fasta_format
.. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif
\ No newline at end of file