doc/manual/mussagl_manual.rst

   1 ==============
   2 Mussagl Manual
   3 ==============
   4 ---------------
   5 Brandon W. King
   6 ---------------
   7
   8 Last updated: July 7th, 2006
   9
  10 Updated to Mussagl build: 287
  11
  12
  13 .. contents::
  14
  15 Introduction
  16 ============
  17
  18
  19 What is Mussagl?
  20 ----------------
  21
  22 Mussa is an N-way version of the FamilyRelations (which is a part of
  23 the Cartwheel project) 2-way comparative sequence analysis
  24 software. Given DNA sequence from N species, Mussa uses all possible
  25 pairwise comparions to derive an N-wise comparison. For example, given
  26 sequences 1,2,3, and 4, Mussa makes 6 2-way comparisons: 1vs2, 1vs3,
  27 1vs4, 2vs3, 2vs4, and 3vs4. It then compares all the links between
  28 these comparisons, saving those that satisfy a transitivity
  29 requirement. The saved paths are then displayed in an interactive
  30 viewer.
  31
  32 Short History of Mussa
  33 ----------------------
  34
  35
  36 Mussa Python/PMW Prototype
  37 ~~~~~~~~~~~~~~~~~~~~~~~~~~
  38
  39 First Python/PMW based protoype.
  40
  41 Mussa C++/FLTK
  42 ~~~~~~~~~~~~~~
  43
  44 A rewrite for speed purposes using C++ and FLTK GUI toolkit.
  45
  46 Mussagl C++/Qt/OpenGL
  47 ~~~~~~~~~~~~~~~~~~~~~
  48
  49 Refactored version using the more elegant Qt GUI framework and
  50 OpenGL for hardware acceleration for those who have beter graphics
  51 cards.
  52
  53 Getting Mussagl
  54 ===============
  55
  56 License
  57 -------
  58
  59 Mussagl has been released open source under the `GPL v2
  60 license`__.
  61
  62 __ GPL_
  63
  64 Platforms
  65 ---------
  66
  67 You have the option of building from source or downloading prebuilt
  68 binaries. Most people will want the prebuilt versions.
  69
  70 Supported Platforms:
  71
  72  * Mac OS X (binary or source)
  73  * Windows XP (binary or source)
  74  * Linux (source)
  75
  76 Download
  77 --------
  78
  79 Mussagl in binary form for OS X and Windows and/or source can be
  80 downloaded from http://mussa.caltech.edu/.
  81
  82 Install
  83 -------
  84
  85 Mac OS X
  86 ~~~~~~~~
  87 Once you have downloaded the .dmg file, double click on it and follow
  88 the install instructions.
  89
  90 FIXME: Mention how to launch the program.
  91
  92
  93 Windows XP
  94 ~~~~~~~~~~
  95 Once you have downloaded the Mussagl installer, double click on the
  96 installer and follow the install instructions.
  97
  98 To start Mussagl, launch the program from Start > Programs > Mussagl >
  99 Mussagl.
 100
 101
 102 Linux
 103 ~~~~~
 104 Currently we do not have a binary installer for Linux. You will have
 105 to build from source. See the 'build from source' section below.
 106
 107
 108 Build from Source
 109 ~~~~~~~~~~~~~~~~~
 110
 111 Instructions for building from source can be found `build page
 112 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild>`_ on the
 113 `Mussa wiki`__.
 114
 115 __ wiki_
 116
 117
 118 Obtaining Input Data
 119 ====================
 120
 121 If you already have your data, you can skip ahead to the the `Using
 122 Mussagl`_ section.
 123
 124 Lets say you have a gene of interest called 'SMN1' and you want to
 125 know how the sequence surrounding the gene in multiple species is
 126 conserved. Guess what, that's what we are going to do, retrieve the
 127 DNA sequence for SMN1 and prepare it for using in Mussa.
 128
 129 For more information about SMN1 visit `NCBI's OMIM
 130 <http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
 131
 132 UCSC Genome Browser Method
 133 --------------------------
 134
 135 There are many methods of retrieving DNA sequence, but for this
 136 example we will retrieve SMN1 through the UCSC genome broswer located
 137 at http://genome.ucsc.edu/.
 138
 139 .. image:: images/ucsc_genome_browser_home.png
 140    :alt: UCSC Genome Broswer
 141    :align: center
 142
 143 Step 1 - Find SMN1
 144 ~~~~~~~~~~~~~~~~~~
 145
 146 The first step in finding SMN1 is to use the **Gene Sorter** menu
 147 option which I have highlighted in orange below:
 148
 149 .. image:: images/ucsc_menu_bar_gene_sorter.png
 150    :alt: Gene Sorter Menu Option
 151    :align: center
 152
 153 Gene Sorter page:
 154
 155 .. image:: images/ucsc_gene_sorter.png
 156    :alt: Gene Sorter
 157    :align: center
 158
 159 We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
 160
 161 .. image:: images/ucsc_gs_sort_name_sim.png
 162    :alt: Gene Sorter - Name Similarity
 163    :align: center
 164
 165 After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
 166
 167 .. image:: images/ucsc_gs_smn1.png
 168    :alt: Gene
 169    :align: center
 170
 171 Press **Go!** and you should see the following page:
 172
 173 .. image:: images/ucsc_gs_found.png
 174    :alt: Found SMN1
 175    :align: center
 176
 177 Click on **SMN1** and you will be taking the gene expression atlas
 178 page.
 179
 180 .. image:: images/ucsc_gs_genome_position.png
 181    :alt: Gene expression atlas
 182    :align: center
 183
 184 Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
 185 position column**.
 186
 187 Now we have found the location of SMN1 on human!
 188
 189 .. image:: images/ucsc_gb_smn1_human.png
 190    :alt: Genome Browser - SMN1 (human)
 191    :align: center
 192
 193
 194 Step 2 - Download CDS/UTR sequence for annotations
 195 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 196
 197 Since we have found **SMN1**, this would be a convient time to extract
 198 the DNA sequence for the CDS and UTRs of the gene to use it as an
 199 annotation_ in Mussa.
 200
 201 **Click on SMN1** shown **between** the **two orange arrows** shown
 202 below.
 203
 204 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
 205    :alt: Genome Browser - SMN1 (human) - Orange Arrows
 206    :align: center
 207
 208 You should find yourself at the SMN1 description page.
 209
 210 .. image:: images/ucsc_gb_smn1_description_page.png
 211    :alt: Genome Browser - SMN1 (human) - Description page
 212    :align: center
 213
 214 **Scroll down** until you get to the **Sequence section** and click on
 215 **Genomic (chr5:70,256,524-70,284,592)**.
 216
 217 .. image:: images/ucsc_gb_smn1_human_sequence.png
 218    :alt: Genome Browser - SMN1 (human) - Sequence
 219    :align: center
 220
 221 You should now be at the **Genomic sequence near gene** page:
 222
 223 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
 224    :alt: Genome Browser - SMN1 (human) - Get genomic sequence
 225    :align: center
 226
 227 Make the following changes (highlighted in orange in the screenshot
 228 below):
 229
 230  1. UNcheck **introns**.
 231     (We only want to annotate CDS and UTRs.)
 232  2. Select **one fasta record** per **region**.
 233     (Mussa needs each CDS and UTR represented by one fasta record per CDS/UTR).
 234  3. Select **CDS in upper case, UTR in lower case.**
 235
 236 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
 237    :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
 238    :align: center
 239
 240 Now click the **submit** button. You will then see a fasta file with
 241 many fasta records representing the CDS and UTRS.
 242
 243 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
 244    :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
 245    :align: center
 246
 247 Now you need to save the fasta records to a **text file**. If you are
 248 using **Firefox** or **Internet Explorer 6+** click on the **File >
 249 Save As** menu option.
 250
 251 **IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
 252 repeat **NOT Webpage Complete** (see screenshot below.)
 253
 254 Type in **smn1_human_annot.txt** for the file name.
 255
 256 .. image:: images/smn1_human_annot.png
 257    :alt: Genome Browser - SMN1 (human) - sequence annotation file
 258    :align: center
 259
 260 **IMPORTANT:** You should open the file with a text editor and make
 261   sure **no html** was saved... If you find any html markup, delete
 262   the markup and save the file.
 263
 264 Now we are going to **modify the file** you just saved to **add the
 265 name of the species** to the **annotation file**. All you have to do
 266 is **add a new line** at the **top of the file** with the word **'Human'** as
 267 shown below:
 268
 269 .. image:: images/smn1_human_annot_plus_human.png
 270    :alt: Genome Browser - SMN1 (human) - sequence annotation file
 271    :align: center
 272
 273 You can add more annotations to this file if you wish. See the
 274 `annotation file format`_ section for details of the file format. By
 275 including fasta records in the annotation_ file, Mussa searches your
 276 DNA sequence for an exact match of the sequence in the annotation_
 277 file. If found, it will be marked as an annotation_ within Mussa.
 278
 279
 280 Step 3 - Download gene and upstream/downstream sequence
 281 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 282
 283 Use the back button in your web browser to get back the **genome
 284 browser view** of **SMN1** as shown below.
 285
 286 .. image:: images/ucsc_gb_smn1_human.png
 287    :alt: Genome Browser - SMN1 (human)
 288    :align: center
 289
 290 There are two options for getting additional sequence around your
 291 gene. The more complex way is to zoom out so that you have the
 292 sequence you want being shown in the genome browser and then follow
 293 the directions for the following method.
 294
 295 The second option, which we will choose, is to leave the genome
 296 browser zoomed exactly at the location of SMN1 and click on the
 297 **DNA** option on the menu bar (shown with orange arrows in the
 298 screenshot below.)
 299
 300 .. image:: images/ucsc_gb_smn1_human_dna_option.png
 301    :alt: Genome Browser - SMN1 (human) - DNA Option
 302    :align: center
 303
 304 Now in the **get dna in window** page, lets add an arbitrary amount of
 305 extra sequence on to each end of the gene, lets say 5000 base pairs.
 306
 307 .. image:: images/ucsc_gb_smn1_human_get_dna.png
 308    :alt: Genome Browser - SMN1 (human) - Get DNA
 309    :align: center
 310
 311 Click the **get DNA** button.
 312
 313 .. image:: images/ucsc_gb_smn1_human_dna.png
 314    :alt: Genome Browser - SMN1 (human) - DNA
 315    :align: center
 316
 317 Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
 318 did in step 2 with the annotation file.
 319
 320 **IMPORTANT:** Make sure the file is saved as a text file and not an
 321 HTML file. Open the file with a text editor and remove any HTML markup
 322 you find.
 323
 324
 325 Step 4 - Same/similar/related gene other species.
 326 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 327
 328 What good is a multiple sequence alignment viewer without multiple
 329 sequences? Lets find a similar gene in a few more species.
 330
 331 Use the back button on your web browser until you get the **genome
 332 broswer view** of **SMN1** as shown below.
 333
 334 .. image:: images/ucsc_genome_browser_home.png
 335    :alt: UCSC Genome Broswer
 336    :align: center
 337
 338 **Click on SMN1** shown **between** the **two orange arrows** shown
 339 below.
 340
 341 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
 342    :alt: Genome Browser - SMN1 (human) - Orange Arrows
 343    :align: center
 344
 345 You should find yourself at the SMN1 description page.
 346
 347 .. image:: images/ucsc_gb_smn1_description_page.png
 348    :alt: Genome Browser - SMN1 (human) - Description page
 349    :align: center
 350
 351 **Scroll down** until you get to the **Sequence section** and click on
 352 **Protein (262 aa)**.
 353
 354 .. image:: images/ucsc_gb_smn1_human_sequence.png
 355    :alt: Genome Browser - SMN1 (human) - Sequence
 356    :align: center
 357
 358 Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
 359 > Copy** option from the menu.
 360
 361 .. image:: images/smn1_human_protein.png
 362    :alt: Genome Browser - SMN1 (human) - Protein
 363    :align: center
 364
 365 Press the back button on the web browser once and then scroll to the
 366 top of the page and click on the **BLAT** option on the menu bar
 367 (shown below with orange arrows).
 368
 369 .. image:: images/ucsc_gb_smn1_human_blat.png
 370    :alt: Genome Browser - SMN1 (human) - Blat
 371    :align: center
 372
 373 **Paste** in the **protein sequence** and **change** the **genome** to
 374 **mouse** as shown below and then click **submit**.
 375
 376 .. image:: images/ucsc_gb_smn1_human_blat_paste.png
 377    :alt: Genome Browser - SMN1 (human) - Blat paste protein
 378    :align: center
 379
 380 Notice that we have two hits, one of which looks pretty good at 89.9%
 381 match.
 382
 383 .. image:: images/ucsc_gb_smn1_human_blat_hits.png
 384    :alt: Genome Browser - SMN1 (human) - Blat hits
 385    :align: center
 386
 387 **Click** on the **brower** link next to the 89.9% match. Notice in
 388 the genome browser (shown below) that there is an annotated gene
 389 called SMN1 for mouse which matches the line called **your sequence
 390 from blat search**. This means we are fairly confidant we found the
 391 right location in the mouse genome.
 392
 393 .. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
 394    :alt: Genome Browser - SMN1 (human) - Blat to browser
 395    :align: center
 396
 397 Follow steps 1 through 3 for mouse and then repeat step 4 with the
 398 human protein sequence to find **SMN1** in the following species (if
 399 you find a match):
 400
 401  1. Rat
 402  2. Rabbit
 403  3. Dog
 404  4. Armadillo
 405  5. Elephant
 406  6. Opposum
 407  7. x_tropicalis
 408
 409 Make sure to save the extended DNA sequence and annotation file for
 410 each one.
 411
 412 Using Mussagl
 413 =============
 414
 415
 416 Launch Mussagl
 417 --------------
 418 Launch Mussagl... It should look similar to the screen shot below.
 419
 420 .. image:: images/opened.png
 421    :alt: Launch Mussa
 422    :align: center
 423
 424
 425
 426 Create/Load Analysis
 427 ----------------------
 428
 429 Currently there are three ways to load a Mussa experiment.
 430
 431  1. `Create a new analysis`_
 432  2. `Load a mussa parameter file`_ (.mupa)
 433  3. `Load an analysis`_
 434
 435 .. _createnew:
 436
 437 Create a new analysis
 438 ~~~~~~~~~~~~~~~~~~~~~
 439
 440 To create a new analysis select 'Define analysis' from the 'File'
 441 menu. You should see a dialog box similar to the one below. For this
 442 demo we will use the example sequences that come with Mussagl.
 443
 444 .. image:: images/define_analysis.png
 445    :alt: Define Analysis
 446    :align: center
 447
 448 Instructions:
 449
 450  1. **Give the experiment a name**, for this demo, we'll use
 451     'demo_w30_t20'. Mussa will create a folder with this name to store
 452     the analysis files in once it has been run.
 453
 454  2. Choose a `window size`_. For this demo **choose 30**.
 455
 456  3. Choose a threshold_... for this demo **choose 20**. See the
 457     Threshold_ section for more detailed information.
 458
 459  4. Choose the number of sequences_ you would like. For this demo
 460     **choose 3**.
 461
 462 .. image:: images/define_analysis_step1a.png
 463    :alt: Steps 1-4
 464    :align: center
 465
 466 Now click on the 'Browse' button next to the sequence input box and
 467 then select /examples/seq/human_mck_pro.fa file. Do the same in the
 468 next two sequence input boxes selecting mouse_mck_pro.fa and
 469 rabbit_mck_pro.fa as shown below. Note that you can create annotation
 470 files using the mussa `Annotation File Format`_ to add annotations to
 471 your sequence.
 472
 473 .. image:: images/define_analysis_step2.png
 474    :alt: Choose sequences
 475    :align: center
 476
 477 Click the **create** button and in a few moments you should see
 478 something similar to the following screen shot.
 479
 480 .. image:: images/demo.png
 481    :alt: Mussagl Demo
 482    :align: center
 483
 484 This analysis is now saved in a directory called **demo_w30_t20** in
 485 the current working directory. If you close and reopen Mussagl, you
 486 can reload the saved analysis. See `Load an analysis`_ section below
 487 for details.
 488
 489
 490 Load a mussa parameter file
 491 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 492
 493 If you prefer, you can define your Mussa analysis using the Mussa
 494 parameter file. See the `Parameter File Format`_ section for details
 495 on creating a .mupa file.
 496
 497 Once you have a .mupa file created, load Mussagl and select the **File >
 498 Load Mussa Parameters** menu option. Select the .mupa file and click
 499 open.
 500
 501 .. image:: images/load_mupa_menu.png
 502    :alt: Load Mussa Parameters
 503    :align: center
 504
 505 If you would like to see an example, you can load the
 506 **mck3test.mupa** file in the examples directory that comes with
 507 Mussagl.
 508
 509 .. image:: images/load_mupa_dialog.png
 510    :alt: Load Mussa Parameters Dialog
 511    :align: center
 512
 513
 514 Load an analysis
 515 ~~~~~~~~~~~~~~~~
 516
 517 To load a previously run analysis open Mussagl and select the **File >
 518 Load Analysis** menu option. Select an analysis **directory** and
 519 click open.
 520
 521 .. image:: images/load_analysis_menu.png
 522    :alt: Load Analysis Menu
 523    :align: center
 524
 525
 526 Main Window
 527 -----------
 528
 529 Overview
 530 ~~~~~~~~
 531 .. Screen-shot with numbers showing features.
 532
 533 .. image:: images/window_overview.png
 534    :alt: Mussa Window
 535    :align: center
 536
 537 Legend:
 538
 539  1. `DNA Sequence (Black bars)`_
 540
 541  2. Annotation_
 542
 543  3. Motif_
 544
 545  4. `Conservation tracks`_
 546
 547  5. `Motif Toggle`_
 548
 549  6. `Zoom Factor`_ (Base pairs per pixel)
 550
 551  7. `Dynamic Threshold`_
 552
 553  8. `Sequence Information Bar`_
 554
 555  9. `Sequence Scroll Bar`_
 556
 557
 558 DNA Sequence (black bars)
 559 ~~~~~~~~~~~~~~~~~~~~~~~~~
 560
 561 .. image:: images/sequence_bar.png
 562    :alt: Sequence Bar
 563    :align: center
 564
 565 Each of the black bars represents one of the loaded sequences, in this
 566 case the sequence around the gene 'MCK' in human, mouse, and rabbit.
 567
 568 FIXME: Should I mention the repeats here?
 569
 570
 571 Annotation
 572 ~~~~~~~~~~
 573
 574 .. figure:: images/annotation.png
 575    :alt: Annotation
 576    :align: center
 577
 578    Annotation shown in green on sequence bar.
 579
 580
 581 Annotations can be included on any of the sequences using the `Load a
 582 mussa parameter file`_ method of loading your sequences. You can
 583 define annotations by location or using an exact sub-sequence and you
 584 may also choose any color for display of the annotation; see the
 585 `Annotation File Format`_ section for details.
 586
 587 Note: Currently there is no way to add annotations using the GUI (only
 588 via the .mupa file). We plan to add this feature in the future, but it
 589 likely will not make it into the first release.
 590
 591
 592 Motif
 593 ~~~~~
 594
 595 .. figure:: images/motif.png
 596    :alt: Motif
 597    :align: center
 598
 599    Motif shown in light blue on sequence bar.
 600
 601 The only real difference between an annotation and motif in Mussagl is
 602 that you can define motifs from within the GUI. See the `Motifs`_
 603 section for more information.
 604
 605
 606 Conservation tracks
 607 ~~~~~~~~~~~~~~~~~~~
 608
 609 .. figure:: images/conservation_tracks.png
 610    :alt: Conservation Tracks
 611    :align: center
 612
 613    Conservations tracks shown as red and blue lines between sequence
 614    bars.
 615
 616 The **red lines** between the sequence bars represent conservation
 617 between the sequences and **blue lines** represent **reverse
 618 complement** conservation. The amount of sequence conservation shown
 619 will depend on the relatedness of your sequences and the `dynamic
 620 threshold` you are using. Sequences with lots of repeats will cause
 621 major slow downs in calculating the matches.
 622
 623
 624 Motif Toggle
 625 ~~~~~~~~~~~~
 626
 627 .. image:: images/motif_toggle.png
 628    :alt: Motif Toggle
 629    :align: center
 630
 631 Toggles motifs on and off. This will not turn on and off annotations.
 632
 633 Note: As of the current build (#200), this feature hasn't been
 634 implemented.
 635
 636
 637 Zoom Factor
 638 ~~~~~~~~~~~
 639
 640 .. image:: images/zoom_factor.png
 641    :alt: Zoom Factor
 642    :align: center
 643
 644 The zoom factor represents the number of base pairs represented per
 645 pixel. When you zoom in far enough the sequence will switch from
 646 seeing a black bar, representing the sequence, to the actual sequence
 647 (well, ASCII representation of sequence).
 648
 649
 650 Dynamic Threshold
 651 ~~~~~~~~~~~~~~~~~
 652
 653 .. image:: images/dynamic_threshold.png
 654    :alt: Dynamic Threshold
 655    :align: center
 656
 657 You can dynamically change the threshold for how strong of match you
 658 consider the conservation to be with one of two options:
 659
 660  1. Number of base pair matches out of window size.
 661
 662  2. Percent base pair conservation.
 663
 664 See the Threshold_ section for more information.
 665
 666
 667 Sequence Information Bar
 668 ~~~~~~~~~~~~~~~~~~~~~~~~
 669
 670 .. image:: images/seq_info_bar.png
 671    :alt: Sequence Information Bar
 672    :align: center
 673
 674 The sequence information bars can be found to the left and right sides
 675 of Mussagl. Next to each sequence you will find the following
 676 information:
 677
 678  1. Species (If it has been defined)
 679  2. Total Size of Sequence
 680  3. Current base pair position
 681
 682
 683 Sequence Scroll Bar
 684 ~~~~~~~~~~~~~~~~~~~
 685
 686 .. image:: images/scroll_bar.png
 687    :alt: Sequence Scroll Bar
 688    :align: center
 689
 690 The scroll bar allows you to scroll through the sequence which is
 691 useful when you have zoomed in using the `zoom factor`_.
 692
 693
 694 Annotations / Motifs
 695 --------------------
 696
 697 Annotations
 698 ~~~~~~~~~~~
 699
 700 Currently annotations can be added to a sequence using the mussa
 701 `annotation file format`_ and can be loaded by selecting the
 702 annotation file when defining a new analysis (see `Create a new
 703 analysis`_ section) or by defining a .mupa file pointing to your
 704 annotation file (see `Load a mussa parameter file`_ section).
 705
 706 Motifs
 707 ~~~~~~
 708
 709 Load Motifs from File
 710 *********************
 711
 712 It is possible to load motifs from a file which was saved from a
 713 previous run or by defining your own motif file. See the `Motif File
 714 Format`_ section for details.
 715
 716 To load a motif file, select **Load Motif List** item from the
 717 **File** menu and select a motif list file.
 718
 719 .. image:: images/load_motif.png
 720    :alt: Load Motif List
 721    :align: center
 722
 723
 724 Save Motifs to File
 725 *******************
 726
 727 Note: Currently not implemented
 728
 729
 730 Motif Dialog
 731 ************
 732
 733 **New Features:**
 734
 735 Build 276
 736  * Allow for toggling individual motifs on and off.
 737
 738 Build 269
 739  * Field added for naming motifs.
 740
 741 Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
 742 Code`_ for defining a motif. To define a motif, select **Edit > Edit
 743 Motifs** menu item as shown below.
 744
 745 .. image:: images/view_edit_motifs.png
 746    :alt: "View > Edit Motifs" Menu
 747    :align: center
 748
 749 You will see a dialog box appear with a "set motifs" button and 10
 750 rows for defining motifs and the color that will be displayed on the
 751 sequence. By default all 10 motifs start off as with white as the
 752 color. In the image below, I changed the color from white to blue to
 753 make it easier to see. The first text box is for the motif and the
 754 second box is for the name of the motif. The check box defines whether
 755 the motif is displayed or not.
 756
 757 .. image:: images/motif_dialog_start.png
 758    :alt: Motif Dialog
 759    :align: center
 760
 761 Now lets make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
 762 Code`_, type in **'ATSCT'** into the first box and 'My Motif' for the
 763 name in the second box as shown below.
 764
 765 .. image:: images/motif_dialog_enter_motif.png
 766    :alt: Enter Motif
 767    :align: center
 768
 769 Now choose a color for your motif by clicking on the colored area to
 770 the left of the motif. In the image above, you would click on the blue
 771 square, but by default the squares will be white. Remember to choose a
 772 color that will show up well with a black bar as the background.
 773
 774 .. image:: images/color_chooser.png
 775    :alt: Color Chooser
 776    :align: center
 777
 778 Once you have selected the color for your motif, click on the 'set
 779 motifs' button. Notice that if Mussa finds matches to your motif will
 780 now show up in the main Mussagl window.
 781
 782 Before Motif:
 783
 784 .. image:: images/motif_dialog_bar_before.png
 785    :alt: Sequence bar before motif
 786    :align: center
 787
 788 After Motif:
 789
 790 .. image:: images/motif_dialog_bar_after.png
 791    :alt: Sequence bar after motif
 792    :align: center
 793
 794
 795 View Mussa Alignements
 796 ----------------------
 797
 798 Mussagl allows you to zoom in on Mussa alignments by selecting the set
 799 of alignment(s) of interest. To do this, move the mouse near the
 800 alignment you are interested in viewing and then **PRESS** and
 801 **HOLD** the **LEFT mouse button** and **drag the mouse** to the other
 802 side of the conservation track so that you see a bounding box
 803 overlaping the alienment(s) of interest and then **let go** of the
 804 *left mouse button*.
 805
 806 In the example below, I started by left clicking on the area marked by
 807 a red dot (upper left corner of bounding box) and draging the mouse to
 808 the area marked by a blue dot (lower right corner of the bounding box)
 809 and letting go of the left mouse button.
 810
 811 .. image:: images/select_sequence.png
 812    :alt: Select Sequence
 813    :align: center
 814
 815 All of the lines which were not selected should be washed out as shown
 816 below:
 817
 818 .. image:: images/washed_out.png
 819    :alt: Tracks washed out
 820    :align: center
 821
 822 With a selection made, goto the **View** menu and select **View mussa alignment**.
 823
 824 .. image:: images/view_mussa_alignment.png
 825    :alt: View mussa alignment
 826    :align: center
 827
 828 You should see the alignment at the base-pair level as shown below.
 829
 830 .. image:: images/mussa_alignment.png
 831    :alt: Mussa alignment
 832    :align: center
 833
 834
 835 Sub-analysis
 836 ------------
 837
 838 To run a sub-analysis **highlight** a section of sequence and *right
 839 click* on it and select **Add to subanalysis**. To the same for the
 840 sequences shown in orange in the screenshot below. Note that you **are
 841 NOT limited** to selecting more than one subsequence from the same
 842 sequence.
 843
 844 .. image:: images/subanalysis_select_seqs.png
 845    :alt: Subanalysis sequence selection
 846    :align: center
 847
 848 Once you have added your sequences for subanalysis, choose a `window size`_ and `threshold`_ and click **Ok**.
 849
 850 .. image:: images/subanalysis_dialog.png
 851    :alt: Subanalysis Dialog
 852    :align: center
 853
 854 A new Mussa window will appear with the subanalysis of your sequences
 855 once it's done running. This may take a while if you selected large
 856 chunks of sequence with a loose threshold.
 857
 858 .. image:: images/subanalysis_done.png
 859    :alt: Subalaysis complete
 860    :align: center
 861
 862
 863 Copying sequence to clipboard
 864 -----------------------------
 865
 866 To copy a sequence to the clipboard, highlight a section of sequence,
 867 as shown in the screen shot below, and do one of the following:
 868
 869  * Select **Copy as Fasta** from the **Edit** menu.
 870  * **Right Click (Left click + Apple/Command Key on Mac)** on the highlighted sequence and select **Copy as Fasta**.
 871  * Press **Ctrl + C (on PC)** or **Apple/Command Key + C (on Mac)** on the keyboard.
 872
 873 .. image:: images/copy_sequence.png
 874    :alt: Copy sequence
 875    :align: center
 876
 877 Saving to an Image
 878 ---------------------------------
 879
 880 FIXME: Need to write this section
 881
 882
 883 Detailed Information
 884 --------------------
 885
 886 Threshold
 887 ~~~~~~~~~
 888
 889 The threshold of an analysis is in minimum number of base pair matches
 890 must be meet to in order to be kept as a match. Note that you can vary
 891 the threshold from within Mussagl. For example, if you choose a
 892 `window size`_ of **30** and a **threshold** of **20** the mussa nway
 893 transitive algorithm will store all matches that are 20 out of 30 bp
 894 matches or better and pass it on to Mussagl. Mussagl will then allow
 895 you to dynamically choose a threshold from 20 to 30 base pairs. A
 896 threshold of 30 bps would only show 30 out of 30 bp matches. A
 897 threshold of 20 bps would show all matches of 20 out of 30 bps or
 898 better. If you would like to see results for matches lower than 20 out
 899 of 30, you will need to rerun the analysis with a lower threshold.
 900
 901 Window Size
 902 ~~~~~~~~~~~
 903
 904 The typical sizes people tend to choose are between 20 and 30. You
 905 will likely need to experiment with this setting depending on your
 906 needs and input sequence.
 907
 908
 909 Sequences
 910 ~~~~~~~~~
 911
 912 Mussa reads in sequences which are formatted in the fasta_
 913 format. Mussa may take a long time to run (>10 minutes) if the total
 914 bp length near 280Kb. Once mussa has run once, you can reload
 915 previously run analyzes.
 916
 917 FIXME: We have learned more about how much sequence and how many to
 918 put in Mussagl, this information should be documented here.
 919
 920
 921 Mussa File Formats
 922 ------------------
 923
 924 .. _param:
 925
 926 Parameter File Format
 927 ~~~~~~~~~~~~~~~~~~~~~
 928
 929 **File Format (.mupa):**
 930
 931 ::
 932
 933   # name of analysis directory and stem for associated files
 934   ANA_NAME <analysis_name>
 935
 936   # if APPEND vars true, a _wXX and/or _tYY added to analysis name
 937   # where XX = WINDOW and YY = THRESHOLD
 938   # Highly recommeded with use of command line override of WINDOW or THRESHOLD
 939   APPEND_WIN <true/false>
 940   APPEND_THRES <true/false>
 941
 942   # how many sequences are being analyzed
 943   SEQUENCE_NUM <num>
 944
 945   # first sequence info
 946   SEQUENCE <fasta_file_path>
 947   ANNOTATION <annotation_file_path>
 948   SEQ_START <sequence_start>
 949
 950   # the second sequence info
 951   SEQUENCE <fasta_file_path>
 952   # ANNOTATION <annotation_file_path>
 953   SEQ_START <sequence_start>
 954   # SEQ_END <sequence_end>
 955
 956   # third sequence info
 957   SEQUENCE <fasta_file_path>
 958   # ANNOTATION <annotation_file_path>
 959
 960   # analyzes parameters: command line args -w -t will override these
 961   WINDOW <num>
 962   THRESHOLD <num>
 963
 964 .. csv-table:: Parameter File Options:
 965    :header: "Option Name", "Value", "Default", "Required", "Description"
 966    :widths: 30 30 30 30 60
 967
 968    "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
 969    name of directory where analysis will be saved."
 970    "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
 971    "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
 972    "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
 973    to analyze"
 974    "SEQUENCE", "/fasta/filepath.fa", "N/A", "true", "Must define one
 975    sequence per SEQUENCE_NUM."
 976    "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
 977    annotation file. See `annotation file format`_ section for more
 978    information."
 979    "SEQ_START", "integer", "1", "false", "Optional index into fasta file"
 980    "SEQ_END", "integer", "1", "false", "Optional index into fasta file"
 981    "WINDOW", "integer", "N/A", "true", "`Window Size`_"
 982    "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
 983
 984 .. _annot:
 985
 986 Annotation File Format
 987 ~~~~~~~~~~~~~~~~~~~~~~
 988
 989 The first line in the file is the sequence name. Each line there after
 990 is a **space** separated annotation.
 991
 992 New as of build 198:
 993
 994  * The annotation format now supports fasta sequences embedded in the
 995    annotation file as shown in the format example below. Mussagl will
 996    take this sequence and look for an exact match of this sequence in
 997    your sequences. If a match is found, it will label it with the name
 998    of from the fasta header.
 999
1000 Format:
1001
1002 ::
1003
1004   <species/sequence_name>
1005   <start> <stop> <annotation_name> <annotation_type>
1006   <start> <stop> <annotation_name> <annotation_type>
1007   <start> <stop> <annotation_name> <annotation_type>
1008   <start> <stop> <annotation_name> <annotation_type>
1009   >Fasta Header
1010   ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
1011   ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
1012   TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
1013   ACGTACGGCAGTACGCGGTCAGA
1014   <start> <stop> <annotation_name> <annotation_type>
1015   ...
1016
1017 Example:
1018
1019 ::
1020
1021   Mouse
1022   251 500 Glorp Glorptype
1023   751 1000 Glorp Glorptype
1024   1251 1500 Glorp Glorptype
1025   >My favorite DNA sequence
1026   GATTACA
1027   1751 2000 Glorp Glorptype
1028
1029
1030 .. _motif_file_format:
1031
1032 Motif File Format
1033 ~~~~~~~~~~~~~~~~~
1034
1035 Format:
1036
1037   <motif> <red> <green> <blue>
1038
1039 Example:
1040
1041   GGCC 0.0 1 1
1042
1043
1044
1045 IUPAC Nucleotide Code
1046 ~~~~~~~~~~~~~~~~~~~~~~
1047
1048 For your convenience, below is a table of the IUPAC Nucleotide Code.
1049
1050 The following table is table 1 from "Nomenclature for Incompletely
1051 Specified Bases in Nucleic Acid Sequences" which can be found at
1052 http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
1053
1054 ======  =================  ===================================
1055 Symbol  Meaning            Origin of designation
1056 ======  =================  ===================================
1057 G       G                  Guanine
1058 A       A                  Adenine
1059 T       T                  Thymine
1060 C       C                  Cytosine
1061 R       G or A             puRine
1062 Y       T or C             pYrimidine
1063 M       A or C             aMino
1064 K       G or T             Keto
1065 S       G or C             Strong interaction (3 H bonds)
1066 W       A or T             Weak interaction (2 H bonds)
1067 H       A or C or T        not-G, H follows G in the alphabet
1068 B       G or T or C        not-A, B follows A
1069 V       G or C or A        not-T (not-U), V follows U
1070 D       G or A or T        not-C, D follows C
1071 N       G or A or T or C   aNy
1072 ======  =================  ===================================
1073
1074
1075 .. Define links below
1076    ------------------
1077
1078 .. _GPL: http://www.opensource.org/licenses/gpl-license.php
1079 .. _wiki: http://mussa.caltech.edu
1080 .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
1081 .. _fasta: http://en.wikipedia.org/wiki/FASTA_format
1082 .. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif