doc/manual/mussagl_manual.rst

   1 ==============
   2 Mussagl Manual
   3 ==============
   4 ---------------
   5 Brandon W. King
   6 ---------------
   7
   8 Last updated: May 23th, 2006
   9
  10 Updated to Mussagl build: 200 (Update to 230 in progress)
  11
  12
  13 .. contents::
  14
  15 Introduction
  16 ============
  17
  18
  19 What is Mussagl?
  20 ----------------
  21
  22
  23 Short History of Mussa
  24 ----------------------
  25
  26
  27 Mussa Python/PMW Prototype
  28 ~~~~~~~~~~~~~~~~~~~~~~~~~~
  29
  30
  31 Mussa C++/FLTK
  32 ~~~~~~~~~~~~~~
  33
  34
  35 Mussagl C++/Qt/OpenGL
  36 ~~~~~~~~~~~~~~~~~~~~~
  37
  38
  39 Getting Mussagl
  40 ===============
  41
  42 License
  43 -------
  44
  45 Mussagl has been released open source under the `GPL v2
  46 license`__.
  47
  48 __ GPL_
  49
  50 Platforms
  51 ---------
  52
  53 You have the option of building from source or downloading prebuilt
  54 binaries. Most people will want the prebuilt versions.
  55
  56 Supported Platforms:
  57
  58  * Mac OS X (binary or source)
  59  * Windows XP (binary or source)
  60  * Linux (source)
  61
  62 Download
  63 --------
  64
  65 Mussagl in binary form for OS X and Windows and/or source can be
  66 downloaded from http://mussa.caltech.edu/.
  67
  68 Install
  69 -------
  70
  71 Mac OS X
  72 ~~~~~~~~
  73 Once you have downloaded the .dmg file, double click on it and follow
  74 the install instructions.
  75
  76 FIXME: Mention how to launch the program.
  77
  78
  79 Windows XP
  80 ~~~~~~~~~~
  81 Once you have downloaded the Mussagl installer, double click on the
  82 installer and follow the install instructions.
  83
  84 To start Mussagl, launch the program from Start > Programs > Mussagl >
  85 Mussagl.
  86
  87
  88 Linux
  89 ~~~~~
  90 Currently we do not have a binary installer for Linux. You will have
  91 to build from source. See the 'build from source' section below.
  92
  93
  94 Build from Source
  95 ~~~~~~~~~~~~~~~~~
  96
  97 Instructions for building from source can be found `build page
  98 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild>`_ on the
  99 `Mussa wiki`__.
 100
 101 __ wiki_
 102
 103
 104 Obtaining Input Data
 105 ====================
 106
 107 If you already have your data, you can skip ahead to the the `Using
 108 Mussagl`_ section.
 109
 110 Lets say you have a gene of interest called 'SMN1' and you want to
 111 know how the sequence surrounding the gene in multiple species is
 112 conserved. Guess what, that's what we are going to do, retrieve the
 113 DNA sequence for SMN1 and prepare it for using in Mussa.
 114
 115 For more information about SMN1 visit `NCBI's OMIM
 116 <http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
 117
 118 UCSC Genome Browser Method
 119 --------------------------
 120
 121 There are many methods of retrieving DNA sequence, but for this
 122 example we will retrieve SMN1 through the UCSC genome broswer located
 123 at http://genome.ucsc.edu/.
 124
 125 .. image:: images/ucsc_genome_browser_home.png
 126    :alt: UCSC Genome Broswer
 127    :align: center
 128
 129 Step 1 - Find SMN1
 130 ~~~~~~~~~~~~~~~~~~
 131
 132 The first step in finding SMN1 is to use the **Gene Sorter** menu
 133 option which I have highlighted in orange below:
 134
 135 .. image:: images/ucsc_menu_bar_gene_sorter.png
 136    :alt: Gene Sorter Menu Option
 137    :align: center
 138
 139 Gene Sorter page:
 140
 141 .. image:: images/ucsc_gene_sorter.png
 142    :alt: Gene Sorter
 143    :align: center
 144
 145 We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
 146
 147 .. image:: images/ucsc_gs_sort_name_sim.png
 148    :alt: Gene Sorter - Name Similarity
 149    :align: center
 150
 151 After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
 152
 153 .. image:: images/ucsc_gs_smn1.png
 154    :alt: Gene
 155    :align: center
 156
 157 Press **Go!** and you should see the following page:
 158
 159 .. image:: images/ucsc_gs_found.png
 160    :alt: Found SMN1
 161    :align: center
 162
 163 Click on **SMN1** and you will be taking the gene expression atlas
 164 page.
 165
 166 .. image:: images/ucsc_gs_genome_position.png
 167    :alt: Gene expression atlas
 168    :align: center
 169
 170 Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
 171 position column**.
 172
 173 Now we have found the location of SMN1 on human!
 174
 175 .. image:: images/ucsc_gb_smn1_human.png
 176    :alt: Genome Browser - SMN1 (human)
 177    :align: center
 178
 179
 180 Step 2 - Download CDS/UTR sequence for annotations
 181 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 182
 183 Since we have found **SMN1**, this would be a convient time to extract
 184 the DNA sequence for the CDS and UTRs of the gene to use it as an
 185 annotation_ in Mussa.
 186
 187 **Click on SMN1** shown **between** the **two orange arrows** shown
 188 below.
 189
 190 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
 191    :alt: Genome Browser - SMN1 (human) - Orange Arrows
 192    :align: center
 193
 194 You should find yourself at the SMN1 description page.
 195
 196 .. image:: images/ucsc_gb_smn1_description_page.png
 197    :alt: Genome Browser - SMN1 (human) - Description page
 198    :align: center
 199
 200 **Scroll down** until you get to the **Sequence section** and click on
 201 **Genomic (chr5:70,256,524-70,284,592)**.
 202
 203 .. image:: images/ucsc_gb_smn1_human_sequence.png
 204    :alt: Genome Browser - SMN1 (human) - Sequence
 205    :align: center
 206
 207 You should now be at the **Genomic sequence near gene** page:
 208
 209 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
 210    :alt: Genome Browser - SMN1 (human) - Get genomic sequence
 211    :align: center
 212
 213 Make the following changes (highlighted in orange in the screenshot
 214 below):
 215
 216  1. UNcheck **introns**.
 217     (We only want to annotate CDS and UTRs.)
 218  2. Select **one fasta record** per **region**.
 219     (Mussa needs each CDS and UTR represented by one fasta record per CDS/UTR).
 220  3. Select **split UTR and CDS parts of an exon into separate FASTA records**.
 221     (Breaks up **exons** into CDSs and UTRs.)
 222
 223 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
 224    :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
 225    :align: center
 226
 227 Now click the **submit** button. You will then see a fasta file with
 228 many fasta records representing the CDS and UTRS.
 229
 230 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
 231    :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
 232    :align: center
 233
 234 Now you need to save the fasta records to a **text file**. If you are
 235 using **Firefox** or **Internet Explorer 6+** click on the **File >
 236 Save As** menu option.
 237
 238 **IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
 239 repeat **NOT Webpage Complete** (see screenshot below.)
 240
 241 Type in **smn1_human_annot.txt** for the file name.
 242
 243 .. image:: images/smn1_human_annot.png
 244    :alt: Genome Browser - SMN1 (human) - sequence annotation file
 245    :align: center
 246
 247 **IMPORTANT:** You should open the file with a text editor and make
 248   sure **no html** was saved... If you find any html markup, delete
 249   the markup and save the file.
 250
 251 Now we are going to **modify the file** you just saved to **add the
 252 name of the species** to the **annotation file**. All you have to do
 253 is **add a new line** at the **top of the file** with the word **'Human'** as
 254 shown below:
 255
 256 .. image:: images/smn1_human_annot_plus_human.png
 257    :alt: Genome Browser - SMN1 (human) - sequence annotation file
 258    :align: center
 259
 260 You can add more annotations to this file if you wish. See the
 261 `annotation file format`_ section for details of the file format. By
 262 including fasta records in the annotation_ file, Mussa searches your
 263 DNA sequence for an exact match of the sequence in the annotation_
 264 file. If found, it will be marked as an annotation_ within Mussa.
 265
 266
 267 Step 3 - Download gene and upstream/downstream sequence
 268 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 269
 270 Use the back button in your web browser to get back the **genome
 271 browser view** of **SMN1** as shown below.
 272
 273 .. image:: images/ucsc_gb_smn1_human.png
 274    :alt: Genome Browser - SMN1 (human)
 275    :align: center
 276
 277 There are two options for getting additional sequence around your
 278 gene. The more complex way is to zoom out so that you have the
 279 sequence you want being shown in the genome browser and then follow
 280 the directions for the following method.
 281
 282 The second option, which we will choose, is to leave the genome
 283 browser zoomed exactly at the location of SMN1 and click on the
 284 **DNA** option on the menu bar (shown with orange arrows in the
 285 screenshot below.)
 286
 287 .. image:: images/ucsc_gb_smn1_human_dna_option.png
 288    :alt: Genome Browser - SMN1 (human) - DNA Option
 289    :align: center
 290
 291 Now in the **get dna in window** page, lets add an arbitrary amount of
 292 extra sequence on to each end of the gene, lets say 5000 base pairs.
 293
 294 .. image:: images/ucsc_gb_smn1_human_get_dna.png
 295    :alt: Genome Browser - SMN1 (human) - Get DNA
 296    :align: center
 297
 298 Click the **get DNA** button.
 299
 300 .. image:: images/ucsc_gb_smn1_human_dna.png
 301    :alt: Genome Browser - SMN1 (human) - DNA
 302    :align: center
 303
 304 Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
 305 did in step 2 with the annotation file.
 306
 307 **IMPORTANT:** Make sure the file is saved as a text file and not an
 308 HTML file. Open the file with a text editor and remove any HTML markup
 309 you find.
 310
 311
 312 Step 4 - Same/similar/related gene other species.
 313 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 314
 315 What good is a multiple sequence alignment viewer without multiple
 316 sequences? Lets find a similar gene in a few more species.
 317
 318 Use the back button on your web browser until you get the **genome
 319 broswer view** of **SMN1** as shown below.
 320
 321 .. image:: images/ucsc_genome_browser_home.png
 322    :alt: UCSC Genome Broswer
 323    :align: center
 324
 325 **Click on SMN1** shown **between** the **two orange arrows** shown
 326 below.
 327
 328 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
 329    :alt: Genome Browser - SMN1 (human) - Orange Arrows
 330    :align: center
 331
 332 You should find yourself at the SMN1 description page.
 333
 334 .. image:: images/ucsc_gb_smn1_description_page.png
 335    :alt: Genome Browser - SMN1 (human) - Description page
 336    :align: center
 337
 338 **Scroll down** until you get to the **Sequence section** and click on
 339 **Protein (262 aa)**.
 340
 341 .. image:: images/ucsc_gb_smn1_human_sequence.png
 342    :alt: Genome Browser - SMN1 (human) - Sequence
 343    :align: center
 344
 345 Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
 346 > Copy** option from the menu.
 347
 348 .. image:: images/smn1_human_protein.png
 349    :alt: Genome Browser - SMN1 (human) - Protein
 350    :align: center
 351
 352 Press the back button on the web browser once and then scroll to the
 353 top of the page and click on the **BLAT** option on the menu bar
 354 (shown below with orange arrows).
 355
 356 .. image:: images/ucsc_gb_smn1_human_blat.png
 357    :alt: Genome Browser - SMN1 (human) - Blat
 358    :align: center
 359
 360 **Paste** in the **protein sequence** and **change** the **genome** to
 361 **mouse** as shown below and then click **submit**.
 362
 363 .. image:: images/ucsc_gb_smn1_human_blat_paste.png
 364    :alt: Genome Browser - SMN1 (human) - Blat paste protein
 365    :align: center
 366
 367 Notice that we have two hits, one of which looks pretty good at 89.9%
 368 match.
 369
 370 .. image:: images/ucsc_gb_smn1_human_blat_hits.png
 371    :alt: Genome Browser - SMN1 (human) - Blat hits
 372    :align: center
 373
 374 **Click** on the **brower** link next to the 89.9% match. Notice in
 375 the genome browser (shown below) that there is an annotated gene
 376 called SMN1 for mouse which matches the line called **your sequence
 377 from blat search**. This means we are fairly confidant we found the
 378 right location in the mouse genome.
 379
 380 .. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
 381    :alt: Genome Browser - SMN1 (human) - Blat to browser
 382    :align: center
 383
 384 Follow steps 1 through 3 for mouse and then repeat step 4 with the
 385 human protein sequence to find **SMN1** in the following species (if
 386 you find a match):
 387
 388  1. Rat
 389  2. Rabbit
 390  3. Dog
 391  4. Armadillo
 392  5. Elephant
 393  6. Opposum
 394  7. x_tropicalis
 395
 396 Make sure to save the extended DNA sequence and annotation file for
 397 each one.
 398
 399 Using Mussagl
 400 =============
 401
 402
 403 Launch Mussagl
 404 --------------
 405 Launch Mussagl... It should look similar to the screen shot below.
 406
 407 .. image:: images/opened.png
 408    :alt: Launch Mussa
 409    :align: center
 410
 411
 412
 413 Create/Load Analysis
 414 ----------------------
 415
 416 Currently there are three ways to load a Mussa experiment.
 417
 418  1. `Create a new analysis`_
 419  2. `Load a mussa parameter file`_ (.mupa)
 420  3. `Load an analysis`_
 421
 422 .. _createnew:
 423
 424 Create a new analysis
 425 ~~~~~~~~~~~~~~~~~~~~~
 426
 427 To create a new analysis select 'Define analysis' from the 'File'
 428 menu. You should see a dialog box similar to the one below. For this
 429 demo we will use the example sequences that come with Mussagl.
 430
 431 .. image:: images/define_analysis.png
 432    :alt: Define Analysis
 433    :align: center
 434
 435 Instructions:
 436
 437  1. **Give the experiment a name**, for this demo, we'll use
 438     'demo_w30_t20'. Mussa will create a folder with this name to store
 439     the analysis files in once it has been run.
 440
 441  2. Choose a `window size`_. For this demo **choose 30**.
 442
 443  3. Choose a threshold_... for this demo **choose 20**. See the
 444     Threshold_ section for more detailed information.
 445
 446  4. Choose the number of sequences_ you would like. For this demo
 447     **choose 3**.
 448
 449 .. image:: images/define_analysis_step1a.png
 450    :alt: Steps 1-4
 451    :align: center
 452
 453 Now click on the 'Browse' button next to the sequence input box and
 454 then select /examples/seq/human_mck_pro.fa file. Do the same in the
 455 next two sequence input boxes selecting mouse_mck_pro.fa and
 456 rabbit_mck_pro.fa as shown below. Note that you can create annotation
 457 files using the mussa `Annotation File Format`_ to add annotations to
 458 your sequence.
 459
 460 .. image:: images/define_analysis_step2.png
 461    :alt: Choose sequences
 462    :align: center
 463
 464 Click the **create** button and in a few moments you should see
 465 something similar to the following screen shot.
 466
 467 .. image:: images/demo.png
 468    :alt: Mussagl Demo
 469    :align: center
 470
 471 This analysis is now saved in a directory called **demo_w30_t20** in
 472 the current working directory. If you close and reopen Mussagl, you
 473 can reload the saved analysis. See `Load an analysis`_ section below
 474 for details.
 475
 476
 477 Load a mussa parameter file
 478 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 479
 480 If you prefer, you can define your Mussa analysis using the Mussa
 481 parameter file. See the `Parameter File Format`_ section for details
 482 on creating a .mupa file.
 483
 484 Once you have a .mupa file created, load Mussagl and select the **File >
 485 Load Mussa Parameters** menu option. Select the .mupa file and click
 486 open.
 487
 488 .. image:: images/load_mupa_menu.png
 489    :alt: Load Mussa Parameters
 490    :align: center
 491
 492 If you would like to see an example, you can load the
 493 **mck3test.mupa** file in the examples directory that comes with
 494 Mussagl.
 495
 496 .. image:: images/load_mupa_dialog.png
 497    :alt: Load Mussa Parameters Dialog
 498    :align: center
 499
 500
 501 Load an analysis
 502 ~~~~~~~~~~~~~~~~
 503
 504 To load a previously run analysis open Mussagl and select the **File >
 505 Load Analysis** menu option. Select an analysis **directory** and
 506 click open.
 507
 508 .. image:: images/load_analysis_menu.png
 509    :alt: Load Analysis Menu
 510    :align: center
 511
 512
 513 Main Window
 514 -----------
 515
 516 Overview
 517 ~~~~~~~~
 518 .. Screen-shot with numbers showing features.
 519
 520 .. image:: images/window_overview.png
 521    :alt: Mussa Window
 522    :align: center
 523
 524 Legend:
 525
 526  1. `DNA Sequence (Black bars)`_
 527
 528  2. Annotation_
 529
 530  3. Motif_
 531
 532  4. `Conservation tracks`_
 533
 534  5. `Motif Toggle`_
 535
 536  6. `Zoom Factor`_ (Base pairs per pixel)
 537
 538  7. `Dynamic Threshold`_
 539
 540  8. `Sequence Information Bar`_
 541
 542  9. `Sequence Scroll Bar`_
 543
 544
 545 DNA Sequence (black bars)
 546 ~~~~~~~~~~~~~~~~~~~~~~~~~
 547
 548 .. image:: images/sequence_bar.png
 549    :alt: Sequence Bar
 550    :align: center
 551
 552 Each of the black bars represents one of the loaded sequences, in this
 553 case the sequence around the gene 'MCK' in human, mouse, and rabbit.
 554
 555 FIXME: Should I mention the repeats here?
 556
 557
 558 Annotation
 559 ~~~~~~~~~~
 560
 561 .. figure:: images/annotation.png
 562    :alt: Annotation
 563    :align: center
 564
 565    Annotation shown in green on sequence bar.
 566
 567
 568 Annotations can be included on any of the sequences using the `Load a
 569 mussa parameter file`_ method of loading your sequences. You can
 570 define annotations by location or using an exact sub-sequence and you
 571 may also choose any color for display of the annotation; see the
 572 `Annotation File Format`_ section for details.
 573
 574 Note: Currently there is no way to add annotations using the GUI (only
 575 via the .mupa file). We plan to add this feature in the future, but it
 576 likely will not make it into the first release.
 577
 578
 579 Motif
 580 ~~~~~
 581
 582 .. figure:: images/motif.png
 583    :alt: Motif
 584    :align: center
 585
 586    Motif shown in light blue on sequence bar.
 587
 588 The only real difference between an annotation and motif in Mussagl is
 589 that you can define motifs from within the GUI. See the `Motifs`_
 590 section for more information.
 591
 592
 593 Conservation tracks
 594 ~~~~~~~~~~~~~~~~~~~
 595
 596 .. figure:: images/conservation_tracks.png
 597    :alt: Conservation Tracks
 598    :align: center
 599
 600    Conservations tracks shown as red and blue lines between sequence
 601    bars.
 602
 603 The **red lines** between the sequence bars represent conservation
 604 between the sequences and **blue lines** represent **reverse
 605 complement** conservation. The amount of sequence conservation shown
 606 will depend on the relatedness of your sequences and the `dynamic
 607 threshold` you are using. Sequences with lots of repeats will cause
 608 major slow downs in calculating the matches.
 609
 610
 611 Motif Toggle
 612 ~~~~~~~~~~~~
 613
 614 .. image:: images/motif_toggle.png
 615    :alt: Motif Toggle
 616    :align: center
 617
 618 Toggles motifs on and off. This will not turn on and off annotations.
 619
 620 Note: As of the current build (#200), this feature hasn't been
 621 implemented.
 622
 623
 624 Zoom Factor
 625 ~~~~~~~~~~~
 626
 627 .. image:: images/zoom_factor.png
 628    :alt: Zoom Factor
 629    :align: center
 630
 631 The zoom factor represents the number of base pairs represented per
 632 pixel. When you zoom in far enough the sequence will switch from
 633 seeing a black bar, representing the sequence, to the actual sequence
 634 (well, ASCII representation of sequence).
 635
 636
 637 Dynamic Threshold
 638 ~~~~~~~~~~~~~~~~~
 639
 640 .. image:: images/dynamic_threshold.png
 641    :alt: Dynamic Threshold
 642    :align: center
 643
 644 You can dynamically change the threshold for how strong of match you
 645 consider the conservation to be with one of two options:
 646
 647  1. Number of base pair matches out of window size.
 648
 649  2. Percent base pair conservation.
 650
 651 See the Threshold_ section for more information.
 652
 653
 654 Sequence Information Bar
 655 ~~~~~~~~~~~~~~~~~~~~~~~~
 656
 657 .. image:: images/seq_info_bar.png
 658    :alt: Sequence Information Bar
 659    :align: center
 660
 661 The sequence information bars can be found to the left and right sides
 662 of Mussagl. Next to each sequence you will find the following
 663 information:
 664
 665  1. Species (If it has been defined)
 666  2. Total Size of Sequence
 667  3. Current base pair position
 668
 669
 670 Sequence Scroll Bar
 671 ~~~~~~~~~~~~~~~~~~~
 672
 673 .. image:: images/scroll_bar.png
 674    :alt: Sequence Scroll Bar
 675    :align: center
 676
 677 The scroll bar allows you to scroll through the sequence which is
 678 useful when you have zoomed in using the `zoom factor`_.
 679
 680
 681 Annotations / Motifs
 682 --------------------
 683
 684 Annotations
 685 ~~~~~~~~~~~
 686
 687 Currently annotations can be added to a sequence using the mussa
 688 `annotation file format`_ and can be loaded by selecting the
 689 annotation file when defining a new analysis (see `Create a new
 690 analysis`_ section) or by defining a .mupa file pointing to your
 691 annotation file (see `Load a mussa parameter file`_ section).
 692
 693 Motifs
 694 ~~~~~~
 695
 696 Load Motifs from File
 697 *********************
 698
 699 It is possible to load motifs from a file which was saved from a
 700 previous run or by defining your own motif file. See the `Motif File
 701 Format`_ section for details.
 702
 703 To load a motif file, select **Load Motif List** item from the
 704 **File** menu and select a motif list file.
 705
 706 .. image:: images/load_motif.png
 707    :alt: Load Motif List
 708    :align: center
 709
 710
 711 Save Motifs to File
 712 *******************
 713
 714 Note: Currently not implemented
 715
 716
 717 Motif Dialog
 718 ************
 719
 720 Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
 721 Code`_ for defining a motif. To define a motif, select **View > Edit
 722 Motifs** menu item as shown below.
 723
 724 .. image:: images/view_edit_motifs.png
 725    :alt: "View > Edit Motifs" Menu
 726    :align: center
 727
 728 You will see a dialog box appear with a "set motifs" button and 10
 729 rows for defining motifs and the color that will be displayed on the
 730 sequence. By default all 10 motifs start off as with white as the
 731 color. In the image below, I changed the color from white to blue to
 732 make it easier to see.
 733
 734 .. image:: images/motif_dialog_start.png
 735    :alt: Motif Dialog
 736    :align: center
 737
 738 Now lets make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
 739 Code`_, type in **'ATSCT'** into the first box as shown below.
 740
 741 .. image:: images/motif_dialog_enter_motif.png
 742    :alt: Enter Motif
 743    :align: center
 744
 745 Now choose a color for your motif by clicking on the colored area to
 746 the left of the motif. In the image above, you would click on the blue
 747 square, but by default the squares will be white. Remember to choose a
 748 color that will show up well with a black bar as the background.
 749
 750 .. image:: images/color_chooser.png
 751    :alt: Color Chooser
 752    :align: center
 753
 754 Once you have selected the color for your motif, click on the 'set
 755 motifs' button. Notice that if Mussa finds matches to your motif will
 756 now show up in the main Mussagl window.
 757
 758 Before Motif:
 759
 760 .. image:: images/motif_dialog_bar_before.png
 761    :alt: Sequence bar before motif
 762    :align: center
 763
 764 After Motif:
 765
 766 .. image:: images/motif_dialog_bar_after.png
 767    :alt: Sequence bar after motif
 768    :align: center
 769
 770
 771 View Mussa Alignements
 772 ----------------------
 773
 774 Mussagl allows you to zoom in on Mussa alignments by selecting the set
 775 of alignment(s) of interest. To do this, move the mouse near the
 776 alignment you are interested in viewing and then **PRESS** and
 777 **HOLD** the **LEFT mouse button** and **drag the mouse** to the other
 778 side of the conservation track so that you see a bounding box
 779 overlaping the alienment(s) of interest and then **let go** of the
 780 *left mouse button*.
 781
 782 In the example below, I started by left clicking on the area marked by
 783 a red dot (upper left corner of bounding box) and draging the mouse to
 784 the area marked by a blue dot (lower right corner of the bounding box)
 785 and letting go of the left mouse button.
 786
 787 .. image:: images/select_sequence.png
 788    :alt: Select Sequence
 789    :align: center
 790
 791 All of the lines which were not selected should be washed out as shown
 792 below:
 793
 794 .. image:: images/washed_out.png
 795    :alt: Tracks washed out
 796    :align: center
 797
 798 With a selection made, goto the **View** menu and select **View mussa alignment**.
 799
 800 .. image:: images/view_mussa_alignment.png
 801    :alt: View mussa alignment
 802    :align: center
 803
 804 You should see the alignment at the base-pair level as shown below.
 805
 806 .. image:: images/mussa_alignment.png
 807    :alt: Mussa alignment
 808    :align: center
 809
 810
 811
 812
 813 Saving to an Image
 814 ---------------------------------
 815
 816 FIXME: Need to write this section
 817
 818
 819 Detailed Information
 820 --------------------
 821
 822 Threshold
 823 ~~~~~~~~~
 824
 825 The threshold of an analysis is in minimum number of base pair matches
 826 must be meet to in order to be kept as a match. Note that you can vary
 827 the threshold from within Mussagl. For example, if you choose a
 828 `window size`_ of **30** and a **threshold** of **20** the mussa nway
 829 transitive algorithm will store all matches that are 20 out of 30 bp
 830 matches or better and pass it on to Mussagl. Mussagl will then allow
 831 you to dynamically choose a threshold from 20 to 30 base pairs. A
 832 threshold of 30 bps would only show 30 out of 30 bp matches. A
 833 threshold of 20 bps would show all matches of 20 out of 30 bps or
 834 better. If you would like to see results for matches lower than 20 out
 835 of 30, you will need to rerun the analysis with a lower threshold.
 836
 837 Window Size
 838 ~~~~~~~~~~~
 839
 840 The typical sizes people tend to choose are between 20 and 30. You
 841 will likely need to experiment with this setting depending on your
 842 needs and input sequence.
 843
 844
 845 Sequences
 846 ~~~~~~~~~
 847
 848 Mussa reads in sequences which are formatted in the fasta_
 849 format. Mussa may take a long time to run (>10 minutes) if the total
 850 bp length near 280Kb. Once mussa has run once, you can reload
 851 previously run analyzes.
 852
 853 FIXME: We have learned more about how much sequence and how many to
 854 put in Mussagl, this information should be documented here.
 855
 856
 857 Mussa File Formats
 858 ------------------
 859
 860 .. _param:
 861
 862 Parameter File Format
 863 ~~~~~~~~~~~~~~~~~~~~~
 864
 865 **File Format (.mupa):**
 866
 867 ::
 868
 869   # name of analysis directory and stem for associated files
 870   ANA_NAME <analysis_name>
 871
 872   # if APPEND vars true, a _wXX and/or _tYY added to analysis name
 873   # where XX = WINDOW and YY = THRESHOLD
 874   # Highly recommeded with use of command line override of WINDOW or THRESHOLD
 875   APPEND_WIN <true/false>
 876   APPEND_THRES <true/false>
 877
 878   # how many sequences are being analyzed
 879   SEQUENCE_NUM <num>
 880
 881   # first sequence info
 882   SEQUENCE <fasta_file_path>
 883   ANNOTATION <annotation_file_path>
 884   SEQ_START <sequence_start>
 885
 886   # the second sequence info
 887   SEQUENCE <fasta_file_path>
 888   # ANNOTATION <annotation_file_path>
 889   SEQ_START <sequence_start>
 890   # SEQ_END <sequence_end>
 891
 892   # third sequence info
 893   SEQUENCE <fasta_file_path>
 894   # ANNOTATION <annotation_file_path>
 895
 896   # analyzes parameters: command line args -w -t will override these
 897   WINDOW <num>
 898   THRESHOLD <num>
 899
 900 .. csv-table:: Parameter File Options:
 901    :header: "Option Name", "Value", "Default", "Required", "Description"
 902    :widths: 30 30 30 30 60
 903
 904    "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
 905    name of directory where analysis will be saved."
 906    "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
 907    "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
 908    "SEQUENCE_NUM", "integer", "N/A", "true", "The number of sequences
 909    to analyze"
 910    "SEQUENCE", "/fasta/filepath.fa", "N/A", "true", "Must define one
 911    sequence per SEQUENCE_NUM."
 912    "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
 913    annotation file. See `annotation file format`_ section for more
 914    information."
 915    "SEQ_START", "integer", "1", "false", "Optional index into fasta file"
 916    "SEQ_END", "integer", "1", "false", "Optional index into fasta file"
 917    "WINDOW", "integer", "N/A", "true", "`Window Size`_"
 918    "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
 919
 920 .. _annot:
 921
 922 Annotation File Format
 923 ~~~~~~~~~~~~~~~~~~~~~~
 924
 925 The first line in the file is the sequence name. Each line there after
 926 is a **space** separated annotation.
 927
 928 New as of build 198:
 929
 930  * The annotation format now supports fasta sequences embedded in the
 931    annotation file as shown in the format example below. Mussagl will
 932    take this sequence and look for an exact match of this sequence in
 933    your sequences. If a match is found, it will label it with the name
 934    of from the fasta header.
 935
 936 Format:
 937
 938 ::
 939
 940   <species/sequence_name>
 941   <start> <stop> <annotation_name> <annotation_type>
 942   <start> <stop> <annotation_name> <annotation_type>
 943   <start> <stop> <annotation_name> <annotation_type>
 944   <start> <stop> <annotation_name> <annotation_type>
 945   >Fasta Header
 946   ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
 947   ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
 948   TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
 949   ACGTACGGCAGTACGCGGTCAGA
 950   <start> <stop> <annotation_name> <annotation_type>
 951   ...
 952
 953 Example:
 954
 955 ::
 956
 957   Mouse
 958   251 500 Glorp Glorptype
 959   751 1000 Glorp Glorptype
 960   1251 1500 Glorp Glorptype
 961   >My favorite DNA sequence
 962   GATTACA
 963   1751 2000 Glorp Glorptype
 964
 965
 966 .. _motif_file_format:
 967
 968 Motif File Format
 969 ~~~~~~~~~~~~~~~~~
 970
 971 Format:
 972
 973   <motif> <red> <green> <blue>
 974
 975 Example:
 976
 977   GGCC 0.0 1 1
 978
 979
 980
 981 IUPAC Nucleotide Code
 982 ~~~~~~~~~~~~~~~~~~~~~~
 983
 984 For your convenience, below is a table of the IUPAC Nucleotide Code.
 985
 986 The following table is table 1 from "Nomenclature for Incompletely
 987 Specified Bases in Nucleic Acid Sequences" which can be found at
 988 http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
 989
 990 ======  =================  ===================================
 991 Symbol  Meaning            Origin of designation
 992 ======  =================  ===================================
 993 G       G                  Guanine
 994 A       A                  Adenine
 995 T       T                  Thymine
 996 C       C                  Cytosine
 997 R       G or A             puRine
 998 Y       T or C             pYrimidine
 999 M       A or C             aMino
1000 K       G or T             Keto
1001 S       G or C             Strong interaction (3 H bonds)
1002 W       A or T             Weak interaction (2 H bonds)
1003 H       A or C or T        not-G, H follows G in the alphabet
1004 B       G or T or C        not-A, B follows A
1005 V       G or C or A        not-T (not-U), V follows U
1006 D       G or A or T        not-C, D follows C
1007 N       G or A or T or C   aNy
1008 ======  =================  ===================================
1009
1010
1011 .. Define links below
1012    ------------------
1013
1014 .. _GPL: http://www.opensource.org/licenses/gpl-license.php
1015 .. _wiki: http://mussa.caltech.edu
1016 .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
1017 .. _fasta: http://en.wikipedia.org/wiki/FASTA_format
1018 .. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif