doc/manual/mussagl_manual.rst

   1 ==============
   2 Mussagl Manual
   3 ==============
   4 ---------------
   5 Brandon W. King
   6 ---------------
   7
   8 Last updated: July 7th, 2006
   9
  10 Updated to Mussagl build: 200 (Update to 286 in progress)
  11
  12
  13 .. contents::
  14
  15 Introduction
  16 ============
  17
  18
  19 What is Mussagl?
  20 ----------------
  21
  22 Mussa is an N-way version of the FamilyRelations (which is a part of
  23 the Cartwheel project) 2-way comparative sequence analysis
  24 software. Given DNA sequence from N species, Mussa uses all possible
  25 pairwise comparions to derive an N-wise comparison. For example, given
  26 sequences 1,2,3, and 4, Mussa makes 6 2-way comparisons: 1vs2, 1vs3,
  27 1vs4, 2vs3, 2vs4, and 3vs4. It then compares all the links between
  28 these comparisons, saving those that satisfy a transitivity
  29 requirement. The saved paths are then displayed in an interactive
  30 viewer.
  31
  32 Short History of Mussa
  33 ----------------------
  34
  35
  36 Mussa Python/PMW Prototype
  37 ~~~~~~~~~~~~~~~~~~~~~~~~~~
  38
  39 First Python/PMW based protoype.
  40
  41 Mussa C++/FLTK
  42 ~~~~~~~~~~~~~~
  43
  44 A rewrite for speed purposes using C++ and FLTK GUI toolkit.
  45
  46 Mussagl C++/Qt/OpenGL
  47 ~~~~~~~~~~~~~~~~~~~~~
  48
  49 Refactored version using the more elegant Qt GUI framework and
  50 OpenGL for hardware acceleration for those who have beter graphics
  51 cards.
  52
  53 Getting Mussagl
  54 ===============
  55
  56 License
  57 -------
  58
  59 Mussagl has been released open source under the `GPL v2
  60 license`__.
  61
  62 __ GPL_
  63
  64 Platforms
  65 ---------
  66
  67 You have the option of building from source or downloading prebuilt
  68 binaries. Most people will want the prebuilt versions.
  69
  70 Supported Platforms:
  71
  72  * Mac OS X (binary or source)
  73  * Windows XP (binary or source)
  74  * Linux (source)
  75
  76 Download
  77 --------
  78
  79 Mussagl in binary form for OS X and Windows and/or source can be
  80 downloaded from http://mussa.caltech.edu/.
  81
  82 Install
  83 -------
  84
  85 Mac OS X
  86 ~~~~~~~~
  87 Once you have downloaded the .dmg file, double click on it and follow
  88 the install instructions.
  89
  90 FIXME: Mention how to launch the program.
  91
  92
  93 Windows XP
  94 ~~~~~~~~~~
  95 Once you have downloaded the Mussagl installer, double click on the
  96 installer and follow the install instructions.
  97
  98 To start Mussagl, launch the program from Start > Programs > Mussagl >
  99 Mussagl.
 100
 101
 102 Linux
 103 ~~~~~
 104 Currently we do not have a binary installer for Linux. You will have
 105 to build from source. See the 'build from source' section below.
 106
 107
 108 Build from Source
 109 ~~~~~~~~~~~~~~~~~
 110
 111 Instructions for building from source can be found `build page
 112 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild>`_ on the
 113 `Mussa wiki`__.
 114
 115 __ wiki_
 116
 117
 118 Obtaining Input Data
 119 ====================
 120
 121 If you already have your data, you can skip ahead to the the `Using
 122 Mussagl`_ section.
 123
 124 Lets say you have a gene of interest called 'SMN1' and you want to
 125 know how the sequence surrounding the gene in multiple species is
 126 conserved. Guess what, that's what we are going to do, retrieve the
 127 DNA sequence for SMN1 and prepare it for using in Mussa.
 128
 129 For more information about SMN1 visit `NCBI's OMIM
 130 <http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
 131
 132 UCSC Genome Browser Method
 133 --------------------------
 134
 135 There are many methods of retrieving DNA sequence, but for this
 136 example we will retrieve SMN1 through the UCSC genome broswer located
 137 at http://genome.ucsc.edu/.
 138
 139 .. image:: images/ucsc_genome_browser_home.png
 140    :alt: UCSC Genome Broswer
 141    :align: center
 142
 143 Step 1 - Find SMN1
 144 ~~~~~~~~~~~~~~~~~~
 145
 146 The first step in finding SMN1 is to use the **Gene Sorter** menu
 147 option which I have highlighted in orange below:
 148
 149 .. image:: images/ucsc_menu_bar_gene_sorter.png
 150    :alt: Gene Sorter Menu Option
 151    :align: center
 152
 153 Gene Sorter page:
 154
 155 .. image:: images/ucsc_gene_sorter.png
 156    :alt: Gene Sorter
 157    :align: center
 158
 159 We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
 160
 161 .. image:: images/ucsc_gs_sort_name_sim.png
 162    :alt: Gene Sorter - Name Similarity
 163    :align: center
 164
 165 After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
 166
 167 .. image:: images/ucsc_gs_smn1.png
 168    :alt: Gene
 169    :align: center
 170
 171 Press **Go!** and you should see the following page:
 172
 173 .. image:: images/ucsc_gs_found.png
 174    :alt: Found SMN1
 175    :align: center
 176
 177 Click on **SMN1** and you will be taking the gene expression atlas
 178 page.
 179
 180 .. image:: images/ucsc_gs_genome_position.png
 181    :alt: Gene expression atlas
 182    :align: center
 183
 184 Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
 185 position column**.
 186
 187 Now we have found the location of SMN1 on human!
 188
 189 .. image:: images/ucsc_gb_smn1_human.png
 190    :alt: Genome Browser - SMN1 (human)
 191    :align: center
 192
 193
 194 Step 2 - Download CDS/UTR sequence for annotations
 195 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 196
 197 Since we have found **SMN1**, this would be a convient time to extract
 198 the DNA sequence for the CDS and UTRs of the gene to use it as an
 199 annotation_ in Mussa.
 200
 201 **Click on SMN1** shown **between** the **two orange arrows** shown
 202 below.
 203
 204 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
 205    :alt: Genome Browser - SMN1 (human) - Orange Arrows
 206    :align: center
 207
 208 You should find yourself at the SMN1 description page.
 209
 210 .. image:: images/ucsc_gb_smn1_description_page.png
 211    :alt: Genome Browser - SMN1 (human) - Description page
 212    :align: center
 213
 214 **Scroll down** until you get to the **Sequence section** and click on
 215 **Genomic (chr5:70,256,524-70,284,592)**.
 216
 217 .. image:: images/ucsc_gb_smn1_human_sequence.png
 218    :alt: Genome Browser - SMN1 (human) - Sequence
 219    :align: center
 220
 221 You should now be at the **Genomic sequence near gene** page:
 222
 223 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
 224    :alt: Genome Browser - SMN1 (human) - Get genomic sequence
 225    :align: center
 226
 227 Make the following changes (highlighted in orange in the screenshot
 228 below):
 229
 230  1. UNcheck **introns**.
 231     (We only want to annotate CDS and UTRs.)
 232  2. Select **one fasta record** per **region**.
 233     (Mussa needs each CDS and UTR represented by one fasta record per CDS/UTR).
 234  3. Select **CDS in upper case, UTR in lower case.**
 235
 236 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
 237    :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
 238    :align: center
 239
 240 Now click the **submit** button. You will then see a fasta file with
 241 many fasta records representing the CDS and UTRS.
 242
 243 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
 244    :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
 245    :align: center
 246
 247 Now you need to save the fasta records to a **text file**. If you are
 248 using **Firefox** or **Internet Explorer 6+** click on the **File >
 249 Save As** menu option.
 250
 251 **IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
 252 repeat **NOT Webpage Complete** (see screenshot below.)
 253
 254 Type in **smn1_human_annot.txt** for the file name.
 255
 256 .. image:: images/smn1_human_annot.png
 257    :alt: Genome Browser - SMN1 (human) - sequence annotation file
 258    :align: center
 259
 260 **IMPORTANT:** You should open the file with a text editor and make
 261   sure **no html** was saved... If you find any html markup, delete
 262   the markup and save the file.
 263
 264 Now we are going to **modify the file** you just saved to **add the
 265 name of the species** to the **annotation file**. All you have to do
 266 is **add a new line** at the **top of the file** with the word **'Human'** as
 267 shown below:
 268
 269 .. image:: images/smn1_human_annot_plus_human.png
 270    :alt: Genome Browser - SMN1 (human) - sequence annotation file
 271    :align: center
 272
 273 You can add more annotations to this file if you wish. See the
 274 `annotation file format`_ section for details of the file format. By
 275 including fasta records in the annotation_ file, Mussa searches your
 276 DNA sequence for an exact match of the sequence in the annotation_
 277 file. If found, it will be marked as an annotation_ within Mussa.
 278
 279
 280 Step 3 - Download gene and upstream/downstream sequence
 281 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 282
 283 Use the back button in your web browser to get back the **genome
 284 browser view** of **SMN1** as shown below.
 285
 286 .. image:: images/ucsc_gb_smn1_human.png
 287    :alt: Genome Browser - SMN1 (human)
 288    :align: center
 289
 290 There are two options for getting additional sequence around your
 291 gene. The more complex way is to zoom out so that you have the
 292 sequence you want being shown in the genome browser and then follow
 293 the directions for the following method.
 294
 295 The second option, which we will choose, is to leave the genome
 296 browser zoomed exactly at the location of SMN1 and click on the
 297 **DNA** option on the menu bar (shown with orange arrows in the
 298 screenshot below.)
 299
 300 .. image:: images/ucsc_gb_smn1_human_dna_option.png
 301    :alt: Genome Browser - SMN1 (human) - DNA Option
 302    :align: center
 303
 304 Now in the **get dna in window** page, lets add an arbitrary amount of
 305 extra sequence on to each end of the gene, lets say 5000 base pairs.
 306
 307 .. image:: images/ucsc_gb_smn1_human_get_dna.png
 308    :alt: Genome Browser - SMN1 (human) - Get DNA
 309    :align: center
 310
 311 Click the **get DNA** button.
 312
 313 .. image:: images/ucsc_gb_smn1_human_dna.png
 314    :alt: Genome Browser - SMN1 (human) - DNA
 315    :align: center
 316
 317 Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
 318 did in step 2 with the annotation file.
 319
 320 **IMPORTANT:** Make sure the file is saved as a text file and not an
 321 HTML file. Open the file with a text editor and remove any HTML markup
 322 you find.
 323
 324
 325 Step 4 - Same/similar/related gene other species.
 326 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 327
 328 What good is a multiple sequence alignment viewer without multiple
 329 sequences? Lets find a similar gene in a few more species.
 330
 331 Use the back button on your web browser until you get the **genome
 332 broswer view** of **SMN1** as shown below.
 333
 334 .. image:: images/ucsc_genome_browser_home.png
 335    :alt: UCSC Genome Broswer
 336    :align: center
 337
 338 **Click on SMN1** shown **between** the **two orange arrows** shown
 339 below.
 340
 341 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
 342    :alt: Genome Browser - SMN1 (human) - Orange Arrows
 343    :align: center
 344
 345 You should find yourself at the SMN1 description page.
 346
 347 .. image:: images/ucsc_gb_smn1_description_page.png
 348    :alt: Genome Browser - SMN1 (human) - Description page
 349    :align: center
 350
 351 **Scroll down** until you get to the **Sequence section** and click on
 352 **Protein (262 aa)**.
 353
 354 .. image:: images/ucsc_gb_smn1_human_sequence.png
 355    :alt: Genome Browser - SMN1 (human) - Sequence
 356    :align: center
 357
 358 Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
 359 > Copy** option from the menu.
 360
 361 .. image:: images/smn1_human_protein.png
 362    :alt: Genome Browser - SMN1 (human) - Protein
 363    :align: center
 364
 365 Press the back button on the web browser once and then scroll to the
 366 top of the page and click on the **BLAT** option on the menu bar
 367 (shown below with orange arrows).
 368
 369 .. image:: images/ucsc_gb_smn1_human_blat.png
 370    :alt: Genome Browser - SMN1 (human) - Blat
 371    :align: center
 372
 373 **Paste** in the **protein sequence** and **change** the **genome** to
 374 **mouse** as shown below and then click **submit**.
 375
 376 .. image:: images/ucsc_gb_smn1_human_blat_paste.png
 377    :alt: Genome Browser - SMN1 (human) - Blat paste protein
 378    :align: center
 379
 380 Notice that we have two hits, one of which looks pretty good at 89.9%
 381 match.
 382
 383 .. image:: images/ucsc_gb_smn1_human_blat_hits.png
 384    :alt: Genome Browser - SMN1 (human) - Blat hits
 385    :align: center
 386
 387 **Click** on the **brower** link next to the 89.9% match. Notice in
 388 the genome browser (shown below) that there is an annotated gene
 389 called SMN1 for mouse which matches the line called **your sequence
 390 from blat search**. This means we are fairly confidant we found the
 391 right location in the mouse genome.
 392
 393 .. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
 394    :alt: Genome Browser - SMN1 (human) - Blat to browser
 395    :align: center
 396
 397 Follow steps 1 through 3 for mouse and then repeat step 4 with the
 398 human protein sequence to find **SMN1** in the following species (if
 399 you find a match):
 400
 401  1. Rat
 402  2. Rabbit
 403  3. Dog
 404  4. Armadillo
 405  5. Elephant
 406  6. Opposum
 407  7. x_tropicalis
 408
 409 Make sure to save the extended DNA sequence and annotation file for
 410 each one.
 411
 412 Using Mussagl
 413 =============
 414
 415
 416 Launch Mussagl
 417 --------------
 418 Launch Mussagl... It should look similar to the screen shot below.
 419
 420 .. image:: images/opened.png
 421    :alt: Launch Mussa
 422    :align: center
 423
 424
 425
 426 Create/Load Analysis
 427 ----------------------
 428
 429 Currently there are three ways to load a Mussa experiment.
 430
 431  1. `Create a new analysis`_
 432  2. `Load a mussa parameter file`_ (.mupa)
 433  3. `Load an analysis`_
 434
 435 .. _createnew:
 436
 437 Create a new analysis
 438 ~~~~~~~~~~~~~~~~~~~~~
 439
 440 To create a new analysis select 'Define analysis' from the 'File'
 441 menu. You should see a dialog box similar to the one below. For this
 442 demo we will use the example sequences that come with Mussagl.
 443
 444 .. image:: images/define_analysis.png
 445    :alt: Define Analysis
 446    :align: center
 447
 448 Instructions:
 449
 450  1. **Give the experiment a name**, for this demo, we'll use
 451     'demo_w30_t20'. Mussa will create a folder with this name to store
 452     the analysis files in once it has been run.
 453
 454  2. Choose a `window size`_. For this demo **choose 30**.
 455
 456  3. Choose a threshold_... for this demo **choose 20**. See the
 457     Threshold_ section for more detailed information.
 458
 459  4. Choose the number of sequences_ you would like. For this demo
 460     **choose 3**.
 461
 462 .. image:: images/define_analysis_step1a.png
 463    :alt: Steps 1-4
 464    :align: center
 465
 466 Now click on the 'Browse' button next to the sequence input box and
 467 then select /examples/seq/human_mck_pro.fa file. Do the same in the
 468 next two sequence input boxes selecting mouse_mck_pro.fa and
 469 rabbit_mck_pro.fa as shown below. Note that you can create annotation
 470 files using the mussa `Annotation File Format`_ to add annotations to
 471 your sequence.
 472
 473 .. image:: images/define_analysis_step2.png
 474    :alt: Choose sequences
 475    :align: center
 476
 477 Click the **create** button and in a few moments you should see
 478 something similar to the following screen shot.
 479
 480 .. image:: images/demo.png
 481    :alt: Mussagl Demo
 482    :align: center
 483
 484 This analysis is now saved in a directory called **demo_w30_t20** in
 485 the current working directory. If you close and reopen Mussagl, you
 486 can reload the saved analysis. See `Load an analysis`_ section below
 487 for details.
 488
 489
 490 Load a mussa parameter file
 491 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 492
 493 If you prefer, you can define your Mussa analysis using the Mussa
 494 parameter file. See the `Parameter File Format`_ section for details
 495 on creating a .mupa file.
 496
 497 Once you have a .mupa file created, load Mussagl and select the **File >
 498 Load Mussa Parameters** menu option. Select the .mupa file and click
 499 open.
 500
 501 .. image:: images/load_mupa_menu.png
 502    :alt: Load Mussa Parameters
 503    :align: center
 504
 505 If you would like to see an example, you can load the
 506 **mck3test.mupa** file in the examples directory that comes with
 507 Mussagl.
 508
 509 .. image:: images/load_mupa_dialog.png
 510    :alt: Load Mussa Parameters Dialog
 511    :align: center
 512
 513
 514 Load an analysis
 515 ~~~~~~~~~~~~~~~~
 516
 517 To load a previously run analysis open Mussagl and select the **File >
 518 Load Analysis** menu option. Select an analysis **directory** and
 519 click open.
 520
 521 .. image:: images/load_analysis_menu.png
 522    :alt: Load Analysis Menu
 523    :align: center
 524
 525
 526 Main Window
 527 -----------
 528
 529 Overview
 530 ~~~~~~~~
 531 .. Screen-shot with numbers showing features.
 532
 533 .. image:: images/window_overview.png
 534    :alt: Mussa Window
 535    :align: center
 536
 537 Legend:
 538
 539  1. `DNA Sequence (Black bars)`_
 540
 541  2. Annotation_
 542
 543  3. Motif_
 544
 545  4. `Conservation tracks`_
 546
 547  5. `Motif Toggle`_
 548
 549  6. `Zoom Factor`_ (Base pairs per pixel)
 550
 551  7. `Dynamic Threshold`_
 552
 553  8. `Sequence Information Bar`_
 554
 555  9. `Sequence Scroll Bar`_
 556
 557
 558 DNA Sequence (black bars)
 559 ~~~~~~~~~~~~~~~~~~~~~~~~~
 560
 561 .. image:: images/sequence_bar.png
 562    :alt: Sequence Bar
 563    :align: center
 564
 565 Each of the black bars represents one of the loaded sequences, in this
 566 case the sequence around the gene 'MCK' in human, mouse, and rabbit.
 567
 568 FIXME: Should I mention the repeats here?
 569
 570
 571 Annotation
 572 ~~~~~~~~~~
 573
 574 .. figure:: images/annotation.png
 575    :alt: Annotation
 576    :align: center
 577
 578    Annotation shown in green on sequence bar.
 579
 580
 581 Annotations can be included on any of the sequences using the `Load a
 582 mussa parameter file`_ method of loading your sequences. You can
 583 define annotations by location or using an exact sub-sequence and you
 584 may also choose any color for display of the annotation; see the
 585 `Annotation File Format`_ section for details.
 586
 587 Note: Currently there is no way to add annotations using the GUI (only
 588 via the .mupa file). We plan to add this feature in the future, but it
 589 likely will not make it into the first release.
 590
 591
 592 Motif
 593 ~~~~~
 594
 595 .. figure:: images/motif.png
 596    :alt: Motif
 597    :align: center
 598
 599    Motif shown in light blue on sequence bar.
 600
 601 The only real difference between an annotation and motif in Mussagl is
 602 that you can define motifs from within the GUI. See the `Motifs`_
 603 section for more information.
 604
 605
 606 Conservation tracks
 607 ~~~~~~~~~~~~~~~~~~~
 608
 609 .. figure:: images/conservation_tracks.png
 610    :alt: Conservation Tracks
 611    :align: center
 612
 613    Conservations tracks shown as red and blue lines between sequence
 614    bars.
 615
 616 The **red lines** between the sequence bars represent conservation
 617 between the sequences and **blue lines** represent **reverse
 618 complement** conservation. The amount of sequence conservation shown
 619 will depend on the relatedness of your sequences and the `dynamic
 620 threshold` you are using. Sequences with lots of repeats will cause
 621 major slow downs in calculating the matches.
 622
 623
 624 Motif Toggle
 625 ~~~~~~~~~~~~
 626
 627 .. image:: images/motif_toggle.png
 628    :alt: Motif Toggle
 629    :align: center
 630
 631 Toggles motifs on and off. This will not turn on and off annotations.
 632
 633 Note: As of the current build (#200), this feature hasn't been
 634 implemented.
 635
 636
 637 Zoom Factor
 638 ~~~~~~~~~~~
 639
 640 .. image:: images/zoom_factor.png
 641    :alt: Zoom Factor
 642    :align: center
 643
 644 The zoom factor represents the number of base pairs represented per
 645 pixel. When you zoom in far enough the sequence will switch from
 646 seeing a black bar, representing the sequence, to the actual sequence
 647 (well, ASCII representation of sequence).
 648
 649
 650 Dynamic Threshold
 651 ~~~~~~~~~~~~~~~~~
 652
 653 .. image:: images/dynamic_threshold.png
 654    :alt: Dynamic Threshold
 655    :align: center
 656
 657 You can dynamically change the threshold for how strong of match you
 658 consider the conservation to be with one of two options:
 659
 660  1. Number of base pair matches out of window size.
 661
 662  2. Percent base pair conservation.
 663
 664 See the Threshold_ section for more information.
 665
 666
 667 Sequence Information Bar
 668 ~~~~~~~~~~~~~~~~~~~~~~~~
 669
 670 .. image:: images/seq_info_bar.png
 671    :alt: Sequence Information Bar
 672    :align: center
 673
 674 The sequence information bars can be found to the left and right sides
 675 of Mussagl. Next to each sequence you will find the following
 676 information:
 677
 678  1. Species (If it has been defined)
 679  2. Total Size of Sequence
 680  3. Current base pair position
 681
 682
 683 Sequence Scroll Bar
 684 ~~~~~~~~~~~~~~~~~~~
 685
 686 .. image:: images/scroll_bar.png
 687    :alt: Sequence Scroll Bar
 688    :align: center
 689
 690 The scroll bar allows you to scroll through the sequence which is
 691 useful when you have zoomed in using the `zoom factor`_.
 692
 693
 694 Annotations / Motifs
 695 --------------------
 696
 697 Annotations
 698 ~~~~~~~~~~~
 699
 700 Currently annotations can be added to a sequence using the mussa
 701 `annotation file format`_ and can be loaded by selecting the
 702 annotation file when defining a new analysis (see `Create a new
 703 analysis`_ section) or by defining a .mupa file pointing to your
 704 annotation file (see `Load a mussa parameter file`_ section).
 705
 706 Motifs
 707 ~~~~~~
 708
 709 Load Motifs from File
 710 *********************
 711
 712 It is possible to load motifs from a file which was saved from a
 713 previous run or by defining your own motif file. See the `Motif File
 714 Format`_ section for details.
 715
 716 To load a motif file, select **Load Motif List** item from the
 717 **File** menu and select a motif list file.
 718
 719 .. image:: images/load_motif.png
 720    :alt: Load Motif List
 721    :align: center
 722
 723
 724 Save Motifs to File
 725 *******************
 726
 727 Note: Currently not implemented
 728
 729
 730 Motif Dialog
 731 ************
 732
 733 Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
 734 Code`_ for defining a motif. To define a motif, select **View > Edit
 735 Motifs** menu item as shown below.
 736
 737 .. image:: images/view_edit_motifs.png
 738    :alt: "View > Edit Motifs" Menu
 739    :align: center
 740
 741 You will see a dialog box appear with a "set motifs" button and 10
 742 rows for defining motifs and the color that will be displayed on the
 743 sequence. By default all 10 motifs start off as with white as the
 744 color. In the image below, I changed the color from white to blue to
 745 make it easier to see.
 746
 747 .. image:: images/motif_dialog_start.png
 748    :alt: Motif Dialog
 749    :align: center
 750
 751 Now lets make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
 752 Code`_, type in **'ATSCT'** into the first box as shown below.
 753
 754 .. image:: images/motif_dialog_enter_motif.png
 755    :alt: Enter Motif
 756    :align: center
 757
 758 Now choose a color for your motif by clicking on the colored area to
 759 the left of the motif. In the image above, you would click on the blue
 760 square, but by default the squares will be white. Remember to choose a
 761 color that will show up well with a black bar as the background.
 762
 763 .. image:: images/color_chooser.png
 764    :alt: Color Chooser
 765    :align: center
 766
 767 Once you have selected the color for your motif, click on the 'set
 768 motifs' button. Notice that if Mussa finds matches to your motif will
 769 now show up in the main Mussagl window.
 770
 771 Before Motif:
 772
 773 .. image:: images/motif_dialog_bar_before.png
 774    :alt: Sequence bar before motif
 775    :align: center
 776
 777 After Motif:
 778
 779 .. image:: images/motif_dialog_bar_after.png
 780    :alt: Sequence bar after motif
 781    :align: center
 782
 783
 784 View Mussa Alignements
 785 ----------------------
 786
 787 Mussagl allows you to zoom in on Mussa alignments by selecting the set
 788 of alignment(s) of interest. To do this, move the mouse near the
 789 alignment you are interested in viewing and then **PRESS** and
 790 **HOLD** the **LEFT mouse button** and **drag the mouse** to the other
 791 side of the conservation track so that you see a bounding box
 792 overlaping the alienment(s) of interest and then **let go** of the
 793 *left mouse button*.
 794
 795 In the example below, I started by left clicking on the area marked by
 796 a red dot (upper left corner of bounding box) and draging the mouse to
 797 the area marked by a blue dot (lower right corner of the bounding box)
 798 and letting go of the left mouse button.
 799
 800 .. image:: images/select_sequence.png
 801    :alt: Select Sequence
 802    :align: center
 803
 804 All of the lines which were not selected should be washed out as shown
 805 below:
 806
 807 .. image:: images/washed_out.png
 808    :alt: Tracks washed out
 809    :align: center
 810
 811 With a selection made, goto the **View** menu and select **View mussa alignment**.
 812
 813 .. image:: images/view_mussa_alignment.png
 814    :alt: View mussa alignment
 815    :align: center
 816
 817 You should see the alignment at the base-pair level as shown below.
 818
 819 .. image:: images/mussa_alignment.png
 820    :alt: Mussa alignment
 821    :align: center
 822
 823
 824 Sub-analysis
 825 ------------
 826
 827 To run a sub-analysis **highlight** a section of sequence and *right
 828 click* on it and select **Add to subanalysis**. To the same for the
 829 sequences shown in orange in the screenshot below. Note that you **are
 830 NOT limited** to selecting more than one subsequence from the same
 831 sequence.
 832
 833 .. image:: images/subanalysis_select_seqs.png
 834    :alt: Subanalysis sequence selection
 835    :align: center
 836
 837 Once you have added your sequences for subanalysis, choose a `window size`_ and `threshold`_ and click **Ok**.
 838
 839 .. image:: images/subanalysis_dialog.png
 840    :alt: Subanalysis Dialog
 841    :align: center
 842
 843 A new Mussa window will appear with the subanalysis of your sequences
 844 once it's done running. This may take a while if you selected large
 845 chunks of sequence with a loose threshold.
 846
 847 .. image:: images/subanalysis_done.png
 848    :alt: Subalaysis complete
 849    :align: center
 850
 851
 852 Copying sequence to clipboard
 853 -----------------------------
 854
 855 To copy a sequence to the clipboard, highlight a section of sequence,
 856 as shown in the screen shot below, and do one of the following:
 857
 858  * Select **Copy as Fasta** from the **Edit** menu.
 859  * **Right Click (Left click + Apple/Command Key on Mac)** on the highlighted sequence and select **Copy as Fasta**.
 860  * Press **Ctrl + C (on PC)** or **Apple/Command Key + C (on Mac)** on the keyboard.
 861
 862 .. image:: images/copy_sequence.png
 863    :alt: Copy sequence
 864    :align: center
 865
 866 Saving to an Image
 867 ---------------------------------
 868
 869 FIXME: Need to write this section
 870
 871
 872 Detailed Information
 873 --------------------
 874
 875 Threshold
 876 ~~~~~~~~~
 877
 878 The threshold of an analysis is in minimum number of base pair matches
 879 must be meet to in order to be kept as a match. Note that you can vary
 880 the threshold from within Mussagl. For example, if you choose a
 881 `window size`_ of **30** and a **threshold** of **20** the mussa nway
 882 transitive algorithm will store all matches that are 20 out of 30 bp
 883 matches or better and pass it on to Mussagl. Mussagl will then allow
 884 you to dynamically choose a threshold from 20 to 30 base pairs. A
 885 threshold of 30 bps would only show 30 out of 30 bp matches. A
 886 threshold of 20 bps would show all matches of 20 out of 30 bps or
 887 better. If you would like to see results for matches lower than 20 out
 888 of 30, you will need to rerun the analysis with a lower threshold.
 889
 890 Window Size
 891 ~~~~~~~~~~~
 892
 893 The typical sizes people tend to choose are between 20 and 30. You
 894 will likely need to experiment with this setting depending on your
 895 needs and input sequence.
 896
 897
 898 Sequences
 899 ~~~~~~~~~
 900
 901 Mussa reads in sequences which are formatted in the fasta_
 902 format. Mussa may take a long time to run (>10 minutes) if the total
 903 bp length near 280Kb. Once mussa has run once, you can reload
 904 previously run analyzes.
 905
 906 FIXME: We have learned more about how much sequence and how many to
 907 put in Mussagl, this information should be documented here.
 908
 909
 910 Mussa File Formats
 911 ------------------
 912
 913 .. _param:
 914
 915 Parameter File Format
 916 ~~~~~~~~~~~~~~~~~~~~~
 917
 918 **File Format (.mupa):**
 919
 920 ::
 921
 922   # name of analysis directory and stem for associated files
 923   ANA_NAME <analysis_name>
 924
 925   # if APPEND vars true, a _wXX and/or _tYY added to analysis name
 926   # where XX = WINDOW and YY = THRESHOLD
 927   # Highly recommeded with use of command line override of WINDOW or THRESHOLD
 928   APPEND_WIN <true/false>
 929   APPEND_THRES <true/false>
 930
 931   # how many sequences are being analyzed
 932   SEQUENCE_NUM <num>
 933
 934   # first sequence info
 935   SEQUENCE <fasta_file_path>
 936   ANNOTATION <annotation_file_path>
 937   SEQ_START <sequence_start>
 938
 939   # the second sequence info
 940   SEQUENCE <fasta_file_path>
 941   # ANNOTATION <annotation_file_path>
 942   SEQ_START <sequence_start>
 943   # SEQ_END <sequence_end>
 944
 945   # third sequence info
 946   SEQUENCE <fasta_file_path>
 947   # ANNOTATION <annotation_file_path>
 948
 949   # analyzes parameters: command line args -w -t will override these
 950   WINDOW <num>
 951   THRESHOLD <num>
 952
 953 .. csv-table:: Parameter File Options:
 954    :header: "Option Name", "Value", "Default", "Required", "Description"
 955    :widths: 30 30 30 30 60
 956
 957    "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
 958    name of directory where analysis will be saved."
 959    "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
 960    "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
 961    "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
 962    to analyze"
 963    "SEQUENCE", "/fasta/filepath.fa", "N/A", "true", "Must define one
 964    sequence per SEQUENCE_NUM."
 965    "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
 966    annotation file. See `annotation file format`_ section for more
 967    information."
 968    "SEQ_START", "integer", "1", "false", "Optional index into fasta file"
 969    "SEQ_END", "integer", "1", "false", "Optional index into fasta file"
 970    "WINDOW", "integer", "N/A", "true", "`Window Size`_"
 971    "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
 972
 973 .. _annot:
 974
 975 Annotation File Format
 976 ~~~~~~~~~~~~~~~~~~~~~~
 977
 978 The first line in the file is the sequence name. Each line there after
 979 is a **space** separated annotation.
 980
 981 New as of build 198:
 982
 983  * The annotation format now supports fasta sequences embedded in the
 984    annotation file as shown in the format example below. Mussagl will
 985    take this sequence and look for an exact match of this sequence in
 986    your sequences. If a match is found, it will label it with the name
 987    of from the fasta header.
 988
 989 Format:
 990
 991 ::
 992
 993   <species/sequence_name>
 994   <start> <stop> <annotation_name> <annotation_type>
 995   <start> <stop> <annotation_name> <annotation_type>
 996   <start> <stop> <annotation_name> <annotation_type>
 997   <start> <stop> <annotation_name> <annotation_type>
 998   >Fasta Header
 999   ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
1000   ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
1001   TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
1002   ACGTACGGCAGTACGCGGTCAGA
1003   <start> <stop> <annotation_name> <annotation_type>
1004   ...
1005
1006 Example:
1007
1008 ::
1009
1010   Mouse
1011   251 500 Glorp Glorptype
1012   751 1000 Glorp Glorptype
1013   1251 1500 Glorp Glorptype
1014   >My favorite DNA sequence
1015   GATTACA
1016   1751 2000 Glorp Glorptype
1017
1018
1019 .. _motif_file_format:
1020
1021 Motif File Format
1022 ~~~~~~~~~~~~~~~~~
1023
1024 Format:
1025
1026   <motif> <red> <green> <blue>
1027
1028 Example:
1029
1030   GGCC 0.0 1 1
1031
1032
1033
1034 IUPAC Nucleotide Code
1035 ~~~~~~~~~~~~~~~~~~~~~~
1036
1037 For your convenience, below is a table of the IUPAC Nucleotide Code.
1038
1039 The following table is table 1 from "Nomenclature for Incompletely
1040 Specified Bases in Nucleic Acid Sequences" which can be found at
1041 http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
1042
1043 ======  =================  ===================================
1044 Symbol  Meaning            Origin of designation
1045 ======  =================  ===================================
1046 G       G                  Guanine
1047 A       A                  Adenine
1048 T       T                  Thymine
1049 C       C                  Cytosine
1050 R       G or A             puRine
1051 Y       T or C             pYrimidine
1052 M       A or C             aMino
1053 K       G or T             Keto
1054 S       G or C             Strong interaction (3 H bonds)
1055 W       A or T             Weak interaction (2 H bonds)
1056 H       A or C or T        not-G, H follows G in the alphabet
1057 B       G or T or C        not-A, B follows A
1058 V       G or C or A        not-T (not-U), V follows U
1059 D       G or A or T        not-C, D follows C
1060 N       G or A or T or C   aNy
1061 ======  =================  ===================================
1062
1063
1064 .. Define links below
1065    ------------------
1066
1067 .. _GPL: http://www.opensource.org/licenses/gpl-license.php
1068 .. _wiki: http://mussa.caltech.edu
1069 .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
1070 .. _fasta: http://en.wikipedia.org/wiki/FASTA_format
1071 .. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif