doc/manual/mussagl_manual.rst

   1 ==============
   2 Mussagl Manual
   3 ==============
   4 ---------------
   5 Brandon W. King
   6 ---------------
   7
   8 Last updated: Oct 27th, 2006
   9
  10 Documentation for Mussagl v1.0
  11
  12
  13 .. Things to add
  14         * New features / change log
  15         * (DONE) Comment out anything isn't implemented yet.
  16         * (DONE) List of features that will be implemented in the future.
  17         * Look into the homology mapping of UCSC.
  18         * Add toggle to genomes.
  19         * Document why one fast record per region.
  20         * How to deal with the hazards of small utrs vis motif finder. (Add warning)
  21         * Add warning about saving FASTA file.
  22         * Add a general principles section near the top
  23                 * Using comparison algorithm which will pickup all repeats
  24                 * Add info about repeatmasking
  25                 * Checking upstream and downstream genes for make sure you are in the right regions.
  26         * Later on: look into Ensembl
  27         * Look into method of homology instead of blating.
  28         * Mention advantages of using mupa.
  29         * Mention the difference between using arrows and scroll bar
  30         * Document the color for motifs
  31         * Update for Mac user left-click
  32
  33         * Wormbase/Flybase/mirBASE tutorials
  34
  35
  36
  37 .. contents::
  38
  39 Status
  40 ======
  41
  42 ..
  43
  44   Major New Features
  45   ------------------
  46
  47   Change Log
  48   ----------
  49
  50   .. INSERT CHANGE LOG HERE
  51   .. END INSERT CHANGE LOG
  52
  53 Features to be Implemented
  54 --------------------------
  55
  56 For an up-to-date list of features to be implemented visit:
  57 http://woldlab.caltech.edu/cgi-bin/mussa/roadmap
  58
  59 Introduction
  60 ============
  61
  62
  63 What is Mussagl?
  64 ----------------
  65
  66 Mussa is an N-way version of the FamilyRelations (which is a part of
  67 the Cartwheel project) 2-way comparative sequence analysis
  68 software. Given DNA sequence from N species, Mussa uses all possible
  69 pairwise comparions to derive an N-wise comparison. For example, given
  70 sequences 1,2,3, and 4, Mussa makes 6 2-way comparisons: 1vs2, 1vs3,
  71 1vs4, 2vs3, 2vs4, and 3vs4. It then compares all the links between
  72 these comparisons, saving those that satisfy a transitivity
  73 requirement. The saved paths are then displayed in an interactive
  74 viewer.
  75
  76 Short History of Mussa
  77 ----------------------
  78
  79 Mussa Python/PMW Prototype
  80 ~~~~~~~~~~~~~~~~~~~~~~~~~~
  81
  82 First Python/PMW based prototype.
  83
  84 Mussa C++/FLTK
  85 ~~~~~~~~~~~~~~
  86
  87 A rewrite for speed purposes using C++ and FLTK GUI toolkit.
  88
  89 Mussagl C++/Qt/OpenGL
  90 ~~~~~~~~~~~~~~~~~~~~~
  91
  92 Refactored version using the more elegant Qt GUI framework and
  93 OpenGL for hardware acceleration for those who have better graphics
  94 cards.
  95
  96 Getting Mussagl
  97 ===============
  98
  99 License
 100 -------
 101
 102 Mussagl has been released open source under the `GPL v2
 103 license`__.
 104
 105 __ GPL_
 106
 107 Platforms
 108 ---------
 109
 110 You have the option of building from source or downloading prebuilt
 111 binaries. Most people will want the prebuilt versions.
 112
 113 Supported Platforms:
 114
 115  * Mac OS X (binary or source)
 116  * Windows XP (binary or source)
 117  * Linux (source)
 118
 119 Download
 120 --------
 121
 122 Mussagl in binary form for OS X and Windows and/or source can be
 123 downloaded from http://mussa.caltech.edu/.
 124
 125 Install
 126 -------
 127
 128 Mac OS X
 129 ~~~~~~~~
 130
 131  * Download .dmg file.
 132  * Double click on .dmg file.
 133  * Drag Mussa icon to your /Applications folder.
 134  * Double Click on Mussa icon to open program.
 135
 136 Windows XP
 137 ~~~~~~~~~~
 138 Once you have downloaded the Mussagl installer, double click on the
 139 installer and follow the install instructions.
 140
 141 To start Mussagl, launch the program from Start > Programs > Mussagl >
 142 Mussagl.
 143
 144
 145 Linux
 146 ~~~~~
 147 Currently we do not have a binary installer for Linux. You will have
 148 to build from source. See the 'build from source' section below.
 149
 150
 151 Build from Source
 152 ~~~~~~~~~~~~~~~~~
 153
 154 Instructions for building from source can be found `build page
 155 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild>`_ on the
 156 `Mussa wiki`__.
 157
 158 __ wiki_
 159
 160
 161 Obtaining Input Data
 162 ====================
 163
 164 If you would like help obtaining data for use with Mussagl, you can
 165 skip ahead to the `Obtaining Input Data - Continued`_ section.
 166
 167 If would like a tour of the software, continue with the `Using
 168 Mussagl`_ section.
 169
 170
 171 Using Mussagl
 172 =============
 173
 174
 175 Launch Mussagl
 176 --------------
 177 Launch Mussagl... It should look similar to the screen shot below.
 178
 179 .. image:: images/opened.png
 180    :alt: Launch Mussa
 181    :align: center
 182
 183
 184
 185 Create/Load Analysis
 186 ----------------------
 187
 188 Currently there are three ways to load a Mussa experiment.
 189
 190  1. `Create a new analysis`_
 191  2. `Load a mussa parameter file`_ (.mupa)
 192  3. `Load an analysis`_
 193
 194 .. _createnew:
 195
 196 Create a new analysis
 197 ~~~~~~~~~~~~~~~~~~~~~
 198
 199 To create a new analysis select 'Define analysis' from the 'File'
 200 menu. You should see a dialog box similar to the one below. For this
 201 demo we will use the example sequences that come with Mussagl.
 202
 203 .. image:: images/define_analysis.png
 204    :alt: Define Analysis
 205    :align: center
 206
 207 Instructions:
 208
 209  1. **Give the experiment a name**, for this demo, we'll use
 210     'demo_w30_t20'. Mussa will create a folder with this name to store
 211     the analysis files in once it has been run.
 212
 213  2. Choose a threshold_... for this demo **choose 20**. See the
 214     Threshold_ section for more detailed information.
 215
 216  3. Choose a `window size`_. For this demo **choose 30**.
 217
 218
 219  4. Choose the number of sequences_ you would like. For this demo
 220     **choose 3**.
 221
 222 .. image:: images/define_analysis_step1a.png
 223    :alt: Steps 1-4
 224    :align: center
 225
 226 First enter the species name of "Human" in the first "Species" text
 227 box. Now click on the 'Browse' button next to the sequence input box
 228 and then select /examples/seq/human_mck_pro.fa file. Do the same in
 229 the next two sequence input boxes selecting mouse_mck_pro.fa and
 230 rabbit_mck_pro.fa as shown below. Make sure to give them a species
 231 name as well. Note that you can create annotation files using the
 232 mussa `Annotation File Format`_ to add annotations to your sequence.
 233
 234 .. image:: images/define_analysis_step2.png
 235    :alt: Choose sequences
 236    :align: center
 237
 238 Click the **create** button and in a few moments you should see
 239 something similar to the following screen shot.
 240
 241 .. image:: images/demo.png
 242    :alt: Mussagl Demo
 243    :align: center
 244
 245 By default your analysis is NOT saved. If you try to close an analysis
 246 without saving, you will be prompted with a dialog box asking you if
 247 you would like to save your analysis. The `Saving`_ section for
 248 details on saving your analysis. When saving, choose directory and
 249 give the analysis the name **demo_w30_t20**. If you close and reopen
 250 Mussagl, you will then be able to load the saved analysis. See `Load
 251 an analysis`_ section below for details.
 252
 253
 254 Load a mussa parameter file
 255 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 256
 257 If you prefer, you can define your Mussa analysis using the Mussa
 258 parameter file. See the `Parameter File Format`_ section for details
 259 on creating a .mupa file.
 260
 261 Once you have a .mupa file created, load Mussagl and select the **File >
 262 Create Analysis from File** menu option. Select the .mupa file and click
 263 open.
 264
 265 .. image:: images/load_mupa_menu.png
 266    :alt: Load Mussa Parameters
 267    :align: center
 268
 269 If you would like to see an example, you can load the
 270 **mck3test.mupa** file in the examples directory that comes with
 271 Mussagl.
 272
 273 .. image:: images/load_mupa_dialog.png
 274    :alt: Load Mussa Parameters Dialog
 275    :align: center
 276
 277
 278 Load an analysis
 279 ~~~~~~~~~~~~~~~~
 280
 281 To load a previously run analysis open Mussagl and select the **File >
 282 Open Existing Analysis** menu option. Select an analysis **directory** and
 283 click open.
 284
 285 .. image:: images/load_analysis_menu.png
 286    :alt: Load Analysis Menu
 287    :align: center
 288
 289
 290 Main Window
 291 -----------
 292
 293 Overview
 294 ~~~~~~~~
 295 .. Screen-shot with numbers showing features.
 296
 297 .. image:: images/window_overview.png
 298    :alt: Mussa Window
 299    :align: center
 300
 301 Legend:
 302
 303  1. `DNA Sequence (Black bars)`_
 304
 305  2. Annotation_
 306
 307  3. Motif_
 308
 309  4. `Red conservation tracks`_
 310
 311  5. `Blue conservation tracks`_
 312
 313  6. `Zoom Factor`_ (Base pairs per pixel)
 314
 315  7. `Dynamic Threshold`_
 316
 317  8. `Sequence Information Bar`_
 318
 319  9. `Sequence Scroll Bar`_
 320
 321
 322 DNA Sequence (black bars)
 323 ~~~~~~~~~~~~~~~~~~~~~~~~~
 324
 325 .. image:: images/sequence_bar.png
 326    :alt: Sequence Bar
 327    :align: center
 328
 329 Each of the black bars represents one of the loaded sequences, in this
 330 case the sequence around the gene 'MCK' in human, mouse, and rabbit.
 331
 332
 333 Annotation
 334 ~~~~~~~~~~
 335
 336 .. figure:: images/annotation.png
 337    :alt: Annotation
 338    :align: center
 339
 340    Annotation shown in green on sequence bar.
 341
 342
 343 Annotations can be included on any of the sequences using the `Load a
 344 mussa parameter file`_ or `Create a new analysis`_ method of loading
 345 your sequences. You can define annotations by location or using an
 346 exact sub-sequence or a FASTA sequence of the section of DNA you wish
 347 to annotate. See the `Annotation File Format`_ section for details.
 348
 349
 350 Motif
 351 ~~~~~
 352
 353 .. figure:: images/motif.png
 354    :alt: Motif
 355    :align: center
 356
 357    Motif shown in light blue on sequence bar.
 358
 359 The only real difference between an annotation and motif in Mussagl is
 360 that you can define motifs and choose a color from within the GUI. See
 361 the `Motifs`_ section for more information.
 362
 363
 364 Red conservation tracks
 365 ~~~~~~~~~~~~~~~~~~~~~~~
 366
 367 .. figure:: images/conservation_tracks.png
 368    :alt: Conservation Tracks
 369    :align: center
 370
 371    Conservations tracks shown as red and blue lines between sequence
 372    bars.
 373
 374 The **red lines** between the sequence bars represent conservation
 375 between the sequences (i.e. not reverse complement matches)
 376
 377 The amount of sequence conservation shown will depend on how much your
 378 sequences are related and the `dynamic threshold`_ you are using.
 379
 380
 381 Blue conservation tracks
 382 ~~~~~~~~~~~~~~~~~~~~~~~~
 383
 384 .. figure:: images/conservation_tracks.png
 385    :alt: Conservation Tracks
 386    :align: center
 387
 388    Conservations tracks shown as red and blue lines between sequence
 389    bars.
 390
 391 **Blue lines** represent **reverse complement** conservation relative
 392 to the sequence attached to the top of the blue line.
 393
 394 The amount of sequence conservation shown will depend on how much your
 395 sequences are related and the `dynamic threshold`_ you are using.
 396
 397
 398 Zoom Factor
 399 ~~~~~~~~~~~
 400
 401 .. image:: images/zoom_factor.png
 402    :alt: Zoom Factor
 403    :align: center
 404
 405 The zoom factor represents the number of base pairs represented per
 406 pixel. When you zoom in far enough the sequence will switch from
 407 seeing a black bar, representing the sequence, to the actual sequence
 408 (well, ASCII representation of sequence).
 409
 410
 411 Dynamic Threshold
 412 ~~~~~~~~~~~~~~~~~
 413
 414 .. image:: images/dynamic_threshold.png
 415    :alt: Dynamic Threshold
 416    :align: center
 417
 418 You can dynamically change the threshold for how strong a match you
 419 consider the conservation to be by changing the value in the dynamic
 420 threshold box.
 421
 422 The value you enter is the minimum number of base pairs that have to
 423 be matched in order to be considered conserved. The second number that
 424 you can't change is the `window size`_ you used when creating the
 425 experiment. The last number is the percent match.
 426
 427 Below is an animation of the dynamic threshold being increased over
 428 time.
 429
 430 .. image:: images/threshold_change.gif
 431    :alt: Animated Dynamic Threshold
 432    :align: center
 433
 434 See the Threshold_ section for more information.
 435
 436
 437 Sequence Information Bar
 438 ~~~~~~~~~~~~~~~~~~~~~~~~
 439
 440 .. image:: images/seq_info_bar.png
 441    :alt: Sequence Information Bar
 442    :align: center
 443
 444 The sequence information bars can be found to the left and right sides
 445 of Mussagl. Next to each sequence you will find the following
 446 information:
 447
 448  1. Species (If it has been defined)
 449  2. Total Size of Sequence
 450  3. Current base pair position
 451
 452 Note that you can **update the species** text box. Make sure to **save your
 453 experiment** after making this change by selecting **File > Save
 454 Analysis** from the menu.
 455
 456 Sequence Scroll Bar
 457 ~~~~~~~~~~~~~~~~~~~
 458
 459 .. image:: images/scroll_bar.png
 460    :alt: Sequence Scroll Bar
 461    :align: center
 462
 463 The scroll bar allows you to scroll through the sequence which is
 464 useful when you have zoomed in using the `zoom factor`_.
 465
 466
 467 Saving
 468 ------
 469
 470 Save on Close
 471 ~~~~~~~~~~~~~
 472
 473 When ever you create a new analysis or make a change such as
 474 adding/editing a motif or changing a species name, an asterisk (*)
 475 will appear in the title of the window showing that there are changes
 476 that have not been saved. If you close a Mussa window without saving
 477 changes, Mussa will ask you if you would like to save the changes that
 478 have been made.
 479
 480 Save Analysis
 481 ~~~~~~~~~~~~~
 482
 483 After making changes, such as updating species names or adding/editing
 484 motifs, you can save these changes by selecting the **File > Save
 485 analysis** menu option or pressing **CTRL + S** (PC) or
 486 **Apple/Command Key + S** (on Mac).
 487
 488 .. image:: images/save_analysis.png
 489    :alt: Save analysis
 490    :align: center
 491
 492 Save Analysis As
 493 ~~~~~~~~~~~~~~~~
 494
 495 To save a copy of your analysis to a new location, select the **File >
 496 Save analysis as** menu option and choose a new location and name for
 497 your analysis.
 498
 499 .. image:: images/save_analysis_as.png
 500    :alt: Save analysis
 501    :align: center
 502
 503 Save Motif List
 504 ~~~~~~~~~~~~~~~
 505
 506 See `Save Motifs to File`_ in the `Motifs`_ section.
 507
 508
 509 Viewing Multiple Analyses
 510 -------------------------
 511
 512 Some times it is useful to view more than one analysis at a time. To
 513 do accomplish this, Mussa allows you to open a new Mussa window by
 514 selecting the **File > New Mussa Window** menu option.
 515
 516 .. image:: images/new_mussa_window_menu.png
 517    :alt: New Mussa Window Menu Option
 518    :align: center
 519
 520 A new Mussa window will pop up.
 521
 522 .. figure:: images/new_mussa_window.png
 523    :alt: New Mussa Window
 524    :align: center
 525
 526    A new Mussa window on the right, in which a second analysis has
 527    been loaded.
 528
 529 Now you can create or load an existing analysis, in this new window,
 530 as described in the `Create/Load Analysis`_ section.
 531
 532 You can view as many analyses as you can fit on your screen or until
 533 you run out of available RAM. If you notice a rapid decrease in
 534 performance and hear lots of noise coming from your hard drive, you
 535 probably ran out of RAM and are now using virtual memory (i.e. much
 536 much slower). If this happens, you may need to avoid opening as many
 537 analyses at one time.
 538
 539
 540 Annotations / Motifs
 541 --------------------
 542
 543 Annotations
 544 ~~~~~~~~~~~
 545
 546 Currently annotations can be added to a sequence using the mussa
 547 `annotation file format`_ and can be loaded by selecting the
 548 annotation file when defining a new analysis (see `Create a new
 549 analysis`_ section) or by defining a .mupa file pointing to your
 550 annotation file (see `Load a mussa parameter file`_ section).
 551
 552 Motifs
 553 ~~~~~~
 554
 555 Load Motifs from File
 556 *********************
 557
 558 It is possible to load motifs from a file which was saved from a
 559 previous run or by defining your own motif file. See the `Motif File
 560 Format`_ section for details.
 561
 562 NOTE: Valid motif list file extensions are:
 563
 564   * .mtl
 565   * .txt
 566
 567 To load a motif file, select **Load Motif List** item from the
 568 **File** menu and select a motif list file.
 569
 570 .. image:: images/load_motif.png
 571    :alt: Load Motif List
 572    :align: center
 573
 574
 575 Save Motifs to File
 576 *******************
 577
 578 Motifs from the `Motif Dialog`_ can be saved to file for use with
 579 other analyses. If you just want your motifs to be saved with your
 580 analysis, see the `save analysis`_ section for details.
 581
 582 To save a motif list, select **File > Save Motifs** menu option. By
 583 default, Mussa will append .mtl if you do not provide a file extension
 584 (valid file extensions: .mtl & .txt).
 585
 586 .. image:: images/save_motifs.png
 587    :alt: Save Motifs
 588    :align: center
 589
 590
 591 Motif Dialog
 592 ************
 593
 594 Mussa has the ability to find lab motifs using the `IUPAC Nucleotide
 595 Code`_ for defining a motif. To define a motif, select **Edit > Edit
 596 Motifs** menu item as shown below.
 597
 598 .. image:: images/view_edit_motifs.png
 599    :alt: "View > Edit Motifs" Menu
 600    :align: center
 601
 602 You will see a dialog box appear with a "apply" button in the bottom
 603 right and one rows for defining motifs and the color that will be
 604 displayed on the sequence. When you start adding your first motif, an
 605 additional row will be added. The check box in the first column
 606 defines whether the motif is displayed or not. The second column is
 607 the motif display color. The third column is for the name of your
 608 motif and finally, the fourth column is motif itself.
 609
 610 .. image:: images/motif_dialog_start.png
 611    :alt: Motif Dialog
 612    :align: center
 613
 614 Now let's make a motif **'AT[C or G]CT'**. Using the `IUPAC Nucleotide
 615 Code`_, type in **'ATSCT'** into the motif field and **'My Motif'** for
 616 the name in the name field as shown below.
 617
 618 Notice how a second row appeared when you started to add the first
 619 motif. Every time you add a new motif, a new row will appear allowing
 620 you to add as many motifs as you need.
 621
 622 .. image:: images/motif_dialog_enter_motif.png
 623    :alt: Enter Motif
 624    :align: center
 625
 626 Now choose a color for your motif by clicking on the colored area to
 627 the left of the name field. Remember to choose a color that will show
 628 up well with a black bar as the background. A good tool for picking a
 629 color is the `Colour Contrast Analyser
 630 <http://juicystudio.com/services/colourcontrast.php>`_ by
 631 `juicystudio.com <http://juicystudio.com/>`_.
 632
 633 .. image:: images/color_chooser.png
 634    :alt: Color Chooser
 635    :align: center
 636
 637 Once you have selected the color for your motif, click on the
 638 **'apply'** button. Notice that if Mussa finds matches to your motif
 639 will now show up in the main Mussa window.
 640
 641 Before Motif:
 642
 643 .. image:: images/motif_dialog_bar_before.png
 644    :alt: Sequence bar before motif
 645    :align: center
 646
 647 After Motif:
 648
 649 .. image:: images/motif_dialog_bar_after.png
 650    :alt: Sequence bar after motif
 651    :align: center
 652
 653 To save your motifs with your analysis, see the `save analysis`_
 654 section. To save your motifs to a file, see the `save motifs to file`_
 655 section.
 656
 657 Deleting a Motif
 658 ^^^^^^^^^^^^^^^^
 659
 660 To delete a motif, remove all text from the name and sequence columns
 661 and close the motif editor.
 662
 663 View Mussa Alignments
 664 ---------------------
 665
 666 Mussagl allows you to zoom in on Mussa alignments by selecting the set
 667 of alignment(s) of interest. To do this, move the mouse near the
 668 alignment you are interested in viewing and then **PRESS** and
 669 **HOLD** the **LEFT mouse button** and **drag the mouse** to the other
 670 side of the conservation track so that you see a bounding box
 671 overlaping the alienment(s) of interest and then **let go** of the
 672 *left mouse button*.
 673
 674 In the example below, I started by left-clicking on the area marked by
 675 a red dot (upper left corner of bounding box) and dragging the mouse to
 676 the area marked by a blue dot (lower right corner of the bounding box)
 677 and letting go of the left mouse button.
 678
 679 .. image:: images/select_sequence.png
 680    :alt: Select Sequence
 681    :align: center
 682
 683 All of the lines which were not selected should be washed out as shown
 684 below:
 685
 686 .. image:: images/washed_out.png
 687    :alt: Tracks washed out
 688    :align: center
 689
 690 With a selection made, goto the **View** menu and select **View mussa alignment**.
 691
 692 .. image:: images/view_mussa_alignment.png
 693    :alt: View mussa alignment
 694    :align: center
 695
 696 You should see the alignment at the base-pair level as shown below.
 697
 698 .. image:: images/mussa_alignment.png
 699    :alt: Mussa alignment
 700    :align: center
 701
 702
 703 Sub-analysis
 704 ------------
 705
 706 Sub-analysis was created to allow you to analyze a sub-region using
 707 different parameters. This may allow you to find matches which may not
 708 have shown up with your initial settings.
 709
 710 To run a sub-analysis **highlight** a section of sequence and *right
 711 click* on it and select **Add to subanalysis**. To the same for the
 712 sequences shown in orange in the screenshot below. Note that you **are
 713 NOT limited** to selecting only one subsequence from the same
 714 sequence.
 715
 716 .. image:: images/subanalysis_select_seqs.png
 717    :alt: Subanalysis sequence selection
 718    :align: center
 719
 720 Once you have added your sequences for subanalysis, choose a `window size`_ and `threshold`_ and click **Ok**.
 721
 722 .. image:: images/subanalysis_dialog.png
 723    :alt: Subanalysis Dialog
 724    :align: center
 725
 726 A new Mussa window will appear with the subanalysis of your sequences
 727 once it's done running. This may take a while if you selected large
 728 chunks of sequence with a loose threshold.
 729
 730 .. image:: images/subanalysis_done.png
 731    :alt: Subalaysis complete
 732    :align: center
 733
 734
 735 Copying sequence to clipboard
 736 -----------------------------
 737
 738 To copy a sequence to the clipboard, highlight a section of sequence,
 739 as shown in the screen shot below, and do one of the following:
 740
 741  * Select **Copy as FASTA** from the **Edit** menu.
 742  * **Right-Click (Left-click + Apple/Command Key on Mac)** on the highlighted sequence and select **Copy as FASTA**.
 743  * Press **Ctrl + C (on PC)** or **Apple/Command Key + C (on Mac)** on the keyboard.
 744
 745 .. image:: images/copy_sequence.png
 746    :alt: Copy sequence
 747    :align: center
 748
 749
 750 Saving to an Image
 751 ---------------------------------
 752
 753 To save your current mussa view to an image, select **File > Save to
 754 image...** as shown below.
 755
 756 .. image:: images/save_to_image_menu.png
 757    :alt: File > Save to image...
 758    :align: center
 759
 760 You can define the width and the height of the image to save. By
 761 default it will use the same size of your current view. Since the
 762 Mussa view is implemented using vectors, if you choose a larger size
 763 then your current view, Mussa will redraw at the higher resolution
 764 when saving. In other words, you get higher quality images when saving
 765 at a higher resolution.
 766
 767 If you check the "Lock aspect ratio" check box, which I have circled
 768 in red, then when you change one value, say width, the other, height,
 769 will update automatically to keep the same aspect ratio.
 770
 771 .. image:: images/save_to_image_dialog.png
 772    :alt: Save to image dialog
 773    :align: center
 774
 775 Click save and choose a location and filename for your file.
 776
 777 The valid image formats are:
 778
 779   * .png (default if no extension specified.)
 780   * .jpg
 781
 782
 783 Detailed Information
 784 --------------------
 785
 786 Threshold
 787 ~~~~~~~~~
 788
 789 The threshold of an analysis is in minimum number of base pair matches
 790 must be meet to in order to be kept as a match. Note that you can vary
 791 the threshold from within Mussagl. For example, if you choose a
 792 `window size`_ of **30** and a **threshold** of **20** the mussa nway
 793 transitive algorithm will store all matches that are 20 out of 30 bp
 794 matches or better and pass it on to Mussagl. Mussagl will then allow
 795 you to dynamically choose a threshold from 20 to 30 base pairs. A
 796 threshold of 30 bps would only show 30 out of 30 bp matches. A
 797 threshold of 20 bps would show all matches of 20 out of 30 bps or
 798 better. If you would like to see results for matches lower than 20 out
 799 of 30, you will need to rerun the analysis with a lower threshold.
 800
 801 Window Size
 802 ~~~~~~~~~~~
 803
 804 The typical sizes people tend to choose are between 20 and 30. You
 805 will likely need to experiment with this setting depending on your
 806 needs and input sequence.
 807
 808
 809 Sequences
 810 ~~~~~~~~~
 811
 812 Mussa reads in sequences which are formatted in the FASTA_
 813 format. Mussa may take a long time to run (>10 minutes) if the total
 814 bp length near 280Kb. Once mussa has run once, you can reload
 815 previously run analyzes.
 816
 817 FIXME: We have learned more about how much sequence and how many to
 818 put in Mussagl, this information should be documented here.
 819
 820
 821 Mussa File Formats
 822 ------------------
 823
 824 .. _param:
 825
 826 Parameter File Format
 827 ~~~~~~~~~~~~~~~~~~~~~
 828
 829 Note that for the comment character '#' to work, it must contain a
 830 space after it (i.e. '# ').
 831
 832 **File Format (.mupa):**
 833
 834 ::
 835
 836   # name of analysis directory and stem for associated files
 837   ANA_NAME <analysis_name>
 838
 839   # if APPEND vars true, a _wXX and/or _tYY added to analysis name
 840   # where XX = WINDOW and YY = THRESHOLD
 841   # Highly recommeded with use of command line override of WINDOW or THRESHOLD
 842   APPEND_WIN <true/false>
 843   APPEND_THRES <true/false>
 844
 845   # first sequence info
 846   SEQUENCE <FASTA_file_path>
 847   ANNOTATION <annotation_file_path>
 848   SEQ_START <sequence_start>
 849
 850   # the second sequence info
 851   SEQUENCE <FASTA_file_path>
 852   # ANNOTATION <annotation_file_path>
 853   SEQ_START <sequence_start>
 854   # SEQ_END <sequence_end>
 855
 856   # third sequence info
 857   SEQUENCE <FASTA_file_path>
 858   # ANNOTATION <annotation_file_path>
 859
 860   # analyzes parameters: command line args -w -t will override these
 861   WINDOW <num>
 862   THRESHOLD <num>
 863
 864 .. csv-table:: Parameter File Options:
 865    :header: "Option Name", "Value", "Default", "Required", "Description"
 866    :widths: 30 30 30 30 60
 867
 868    "ANA_NAME", "string", "N/A", "true", "Name of analysis (Also
 869    name of directory where analysis will be saved."
 870    "APPEND_WIN", "true/false", "?", "?", "Appends _w## to ANA_NAME"
 871    "APPEND_THRES", "true or false", "?", "?", "Appends _t## to ANA_NAME"
 872    "SEQUENCE", "/FASTA/filepath.fa", "N/A", "true", "Absolute/Relative file
 873    path to sequence."
 874    "ANNOTATION", "/annotation/filepath.txt", "N/A", "false", "Optional
 875    annotation file. See `annotation file format`_ section for more
 876    information."
 877    "SEQ_START", "integer", "1", "false", "Optional index into FASTA file"
 878    "SEQ_END", "integer", "1", "false", "Optional index into FASTA file"
 879    "WINDOW", "integer", "N/A", "true", "`Window Size`_"
 880    "THRESHOLD", "integer", "N/A", "true", "`Threshold`_"
 881
 882 .. _annot:
 883
 884 Annotation File Format
 885 ~~~~~~~~~~~~~~~~~~~~~~
 886
 887 The first line in the file is the sequence name. Each line there after
 888 is a **space** separated annotation.
 889
 890 Update:
 891
 892  * The annotation format now supports FASTA sequences embedded in the
 893    annotation file as shown in the format example below. Mussagl will
 894    take this sequence and look for an exact match of this sequence in
 895    your sequences. If a match is found, it will label it with the name
 896    of from the FASTA header.
 897
 898 Format:
 899
 900 ::
 901
 902   <species/sequence_name>
 903   <start> <stop> <annotation_name> <annotation_type>
 904   <start> <stop> <annotation_name> <annotation_type>
 905   <start> <stop> <annotation_name> <annotation_type>
 906   <start> <stop> <annotation_name> <annotation_type>
 907   >FASTA Header
 908   ACTGACTGACGTACGTAGCTAGCTAGCTAGCACG
 909   ACGTACGTACGTACGTAGCTGTCATACGCTAGCA
 910   TGCGTAGAGGATCTCGGATGCTAGCGCTATCGAT
 911   ACGTACGGCAGTACGCGGTCAGA
 912   <start> <stop> <annotation_name> <annotation_type>
 913   ...
 914
 915 Example:
 916
 917 ::
 918
 919   Mouse
 920   251 500 Glorp Glorptype
 921   751 1000 Glorp Glorptype
 922   1251 1500 Glorp Glorptype
 923   >My favorite DNA sequence
 924   GATTACA
 925   1751 2000 Glorp Glorptype
 926
 927
 928 .. _motif_file_format:
 929
 930 Motif File Format
 931 ~~~~~~~~~~~~~~~~~
 932
 933 Format:
 934
 935   <motif> <red> <green> <blue>
 936
 937 Example:
 938
 939   GGCC 0.0 1 1
 940
 941
 942
 943 IUPAC Nucleotide Code
 944 ~~~~~~~~~~~~~~~~~~~~~~
 945
 946 For your convenience, below is a table of the IUPAC Nucleotide Code.
 947
 948 The following table is table 1 from "Nomenclature for Incompletely
 949 Specified Bases in Nucleic Acid Sequences" which can be found at
 950 http://www.chem.qmul.ac.uk/iubmb/misc/naseq.html.
 951
 952 ======  =================  ===================================
 953 Symbol  Meaning            Origin of designation
 954 ======  =================  ===================================
 955 G       G                  Guanine
 956 A       A                  Adenine
 957 T       T                  Thymine
 958 C       C                  Cytosine
 959 R       G or A             puRine
 960 Y       T or C             pYrimidine
 961 M       A or C             aMino
 962 K       G or T             Keto
 963 S       G or C             Strong interaction (3 H bonds)
 964 W       A or T             Weak interaction (2 H bonds)
 965 H       A or C or T        not-G, H follows G in the alphabet
 966 B       G or T or C        not-A, B follows A
 967 V       G or C or A        not-T (not-U), V follows U
 968 D       G or A or T        not-C, D follows C
 969 N       G or A or T or C   aNy
 970 ======  =================  ===================================
 971
 972
 973 Obtaining Input Data - Continued
 974 --------------------------------
 975
 976 If you already have your data, may want to go to the `Using Mussagl`_
 977 section of the manual.
 978
 979 Let's say you have a gene of interest called 'SMN1' and you want to
 980 know how the sequence surrounding the gene in multiple species is
 981 conserved. Guess what, that's what we are going to do, retrieve the
 982 DNA sequence for SMN1 and prepare it for using in Mussa.
 983
 984 For more information about SMN1 visit `NCBI's OMIM
 985 <http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=609682>`_.
 986
 987 The SMN1 data retrieved in this section can be downloaded from the
 988 `Mussa Example Data
 989 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/ExampleData>`_ page if
 990 you prefer to skip this section of the manual.
 991
 992 UCSC Genome Browser Method
 993 --------------------------
 994
 995 There are many methods of retrieving DNA sequence, but for this
 996 example we will retrieve SMN1 through the UCSC genome browser located
 997 at http://genome.ucsc.edu/.
 998
 999
1000 .. image:: images/ucsc_genome_browser_home.png
1001    :alt: UCSC Genome Browser
1002    :align: center
1003
1004 Step 1 - Find SMN1
1005 ~~~~~~~~~~~~~~~~~~
1006
1007 The first step in finding SMN1 is to use the **Gene Sorter** menu
1008 option which I have highlighted in orange below:
1009
1010 .. image:: images/ucsc_menu_bar_gene_sorter.png
1011    :alt: Gene Sorter Menu Option
1012    :align: center
1013
1014 Gene Sorter page:
1015
1016 .. image:: images/ucsc_gene_sorter.png
1017    :alt: Gene Sorter
1018    :align: center
1019
1020 We will start by looking for SMN1 in the **Human Genome** and **sorting by name similarity**.
1021
1022 .. image:: images/ucsc_gs_sort_name_sim.png
1023    :alt: Gene Sorter - Name Similarity
1024    :align: center
1025
1026 After you have selected **Human Genome** and **sorting by name similarity**, type *SMN1* into the search box.
1027
1028 .. image:: images/ucsc_gs_smn1.png
1029    :alt: Gene
1030    :align: center
1031
1032 Press **Go!** and you should see the following page:
1033
1034 .. image:: images/ucsc_gs_found.png
1035    :alt: Found SMN1
1036    :align: center
1037
1038 Click on **SMN1** and you will be taking the gene expression atlas
1039 page.
1040
1041 .. image:: images/ucsc_gs_genome_position.png
1042    :alt: Gene expression atlas
1043    :align: center
1044
1045 Click on **chr5 70,270,558** found in the **SMN1 row**, **Genome
1046 position column**.
1047
1048 Now we have found the location of SMN1 on human!
1049
1050 .. image:: images/ucsc_gb_smn1_human.png
1051    :alt: Genome Browser - SMN1 (human)
1052    :align: center
1053
1054
1055 Step 2 - Download CDS/UTR sequence for annotations
1056 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1057
1058 Since we have found **SMN1**, this would be a convenient time to extract
1059 the DNA sequence for the CDS and UTRs of the gene to use it as an
1060 annotation_ in Mussa.
1061
1062 **Click on SMN1** shown **between** the **two orange arrows** shown
1063 below.
1064
1065 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
1066    :alt: Genome Browser - SMN1 (human) - Orange Arrows
1067    :align: center
1068
1069 You should find yourself at the SMN1 description page.
1070
1071 .. image:: images/ucsc_gb_smn1_description_page.png
1072    :alt: Genome Browser - SMN1 (human) - Description page
1073    :align: center
1074
1075 **Scroll down** until you get to the **Sequence section** and click on
1076 **Genomic (chr5:70,256,524-70,284,592)**.
1077
1078 .. image:: images/ucsc_gb_smn1_human_sequence.png
1079    :alt: Genome Browser - SMN1 (human) - Sequence
1080    :align: center
1081
1082 You should now be at the **Genomic sequence near gene** page:
1083
1084 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence.png
1085    :alt: Genome Browser - SMN1 (human) - Get genomic sequence
1086    :align: center
1087
1088 Make the following changes (highlighted in orange in the screenshot
1089 below):
1090
1091  1. UNcheck **introns**.
1092     (We only want to annotate CDS and UTRs.)
1093  2. Select **one FASTA record** per **region**.
1094     (Mussa needs each CDS and UTR represented by one FASTA record per CDS/UTR).
1095  3. Select **CDS in upper case, UTR in lower case.**
1096
1097 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_diff.png
1098    :alt: Genome Browser - SMN1 (human) - Get genomic sequence setup
1099    :align: center
1100
1101 Now click the **submit** button. You will then see a FASTA file with
1102 many FASTA records representing the CDS and UTRS.
1103
1104 .. image:: images/ucsc_gb_smn1_human_get_genomic_sequence_submit.png
1105    :alt: Genome Browser - SMN1 (human) - CDS/UTR sequence
1106    :align: center
1107
1108 Now you need to save the FASTA records to a **text file**. If you are
1109 using **Firefox** or **Internet Explorer 6+** click on the **File >
1110 Save As** menu option.
1111
1112 **IMPORTANT:** Make sure you select **Text Files** and **NOT**, I
1113 repeat **NOT Webpage Complete** (see screenshot below.)
1114
1115 Type in **smn1_human_annot.txt** for the file name.
1116
1117 .. image:: images/smn1_human_annot.png
1118    :alt: Genome Browser - SMN1 (human) - sequence annotation file
1119    :align: center
1120
1121 **IMPORTANT:** You should open the file with a text editor and make
1122   sure **no HTML** was saved... If you find any HTML markup, delete
1123   the markup and save the file.
1124
1125 Now we are going to **modify the file** you just saved to **add the
1126 name of the species** to the **annotation file**. All you have to do
1127 is **add a new line** at the **top of the file** with the word **'Human'** as
1128 shown below:
1129
1130 .. image:: images/smn1_human_annot_plus_human.png
1131    :alt: Genome Browser - SMN1 (human) - sequence annotation file
1132    :align: center
1133
1134 You can add more annotations to this file if you wish. See the
1135 `annotation file format`_ section for details of the file format. By
1136 including FASTA records in the annotation_ file, Mussa searches your
1137 DNA sequence for an exact match of the sequence in the annotation_
1138 file. If found, it will be marked as an annotation_ within Mussa.
1139
1140
1141 Step 3 - Download gene and upstream/downstream sequence
1142 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1143
1144 Use the back button in your web browser to get back the **genome
1145 browser view** of **SMN1** as shown below.
1146
1147 .. image:: images/ucsc_gb_smn1_human.png
1148    :alt: Genome Browser - SMN1 (human)
1149    :align: center
1150
1151 There are two options for getting additional sequence around your
1152 gene. The more complex way is to zoom out so that you have the
1153 sequence you want being shown in the genome browser and then follow
1154 the directions for the following method.
1155
1156 The second option, which we will choose, is to leave the genome
1157 browser zoomed exactly at the location of SMN1 and click on the
1158 **DNA** option on the menu bar (shown with orange arrows in the
1159 screenshot below.)
1160
1161 .. image:: images/ucsc_gb_smn1_human_dna_option.png
1162    :alt: Genome Browser - SMN1 (human) - DNA Option
1163    :align: center
1164
1165 Now in the **get dna in window** page, let's add an arbitrary amount of
1166 extra sequence on to each end of the gene, let's say 5000 base pairs.
1167
1168 .. image:: images/ucsc_gb_smn1_human_get_dna.png
1169    :alt: Genome Browser - SMN1 (human) - Get DNA
1170    :align: center
1171
1172 Click the **get DNA** button.
1173
1174 .. image:: images/ucsc_gb_smn1_human_dna.png
1175    :alt: Genome Browser - SMN1 (human) - DNA
1176    :align: center
1177
1178 Save the DNA sequence to a text file called 'smn1_human_dna.fa' as we
1179 did in step 2 with the annotation file.
1180
1181 **IMPORTANT:** Make sure the file is saved as a text file and not an
1182 HTML file. Open the file with a text editor and remove any HTML markup
1183 you find.
1184
1185
1186 Step 4 - Same/similar/related gene other species.
1187 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1188
1189 What good is a multiple sequence alignment viewer without multiple
1190 sequences? Let'S find a similar gene in a few more species.
1191
1192 Use the back button on your web browser until you get the **genome
1193 browser view** of **SMN1** as shown below.
1194
1195 .. image:: images/ucsc_genome_browser_home.png
1196    :alt: UCSC Genome Browser
1197    :align: center
1198
1199 **Click on SMN1** shown **between** the **two orange arrows** shown
1200 below.
1201
1202 .. image:: images/ucsc_gb_smn1_human_click_smn1.png
1203    :alt: Genome Browser - SMN1 (human) - Orange Arrows
1204    :align: center
1205
1206 You should find yourself at the SMN1 description page.
1207
1208 .. image:: images/ucsc_gb_smn1_description_page.png
1209    :alt: Genome Browser - SMN1 (human) - Description page
1210    :align: center
1211
1212 **Scroll down** until you get to the **Sequence section** and click on
1213 **Protein (262 aa)**.
1214
1215 .. image:: images/ucsc_gb_smn1_human_sequence.png
1216    :alt: Genome Browser - SMN1 (human) - Sequence
1217    :align: center
1218
1219 Copy the SMN1 protein seqeunce by highlighting it and selecting **Edit
1220 > Copy** option from the menu.
1221
1222 .. image:: images/smn1_human_protein.png
1223    :alt: Genome Browser - SMN1 (human) - Protein
1224    :align: center
1225
1226 Press the back button on the web browser once and then scroll to the
1227 top of the page and click on the **BLAT** option on the menu bar
1228 (shown below with orange arrows).
1229
1230 .. image:: images/ucsc_gb_smn1_human_blat.png
1231    :alt: Genome Browser - SMN1 (human) - Blat
1232    :align: center
1233
1234 **Paste** in the **protein sequence** and **change** the **genome** to
1235 **mouse** as shown below and then click **submit**.
1236
1237 .. image:: images/ucsc_gb_smn1_human_blat_paste.png
1238    :alt: Genome Browser - SMN1 (human) - Blat paste protein
1239    :align: center
1240
1241 Notice that we have two hits, one of which looks pretty good at 89.9%
1242 match.
1243
1244 .. image:: images/ucsc_gb_smn1_human_blat_hits.png
1245    :alt: Genome Browser - SMN1 (human) - Blat hits
1246    :align: center
1247
1248 **Click** on the **brower** link next to the 89.9% match. Notice in
1249 the genome browser (shown below) that there is an annotated gene
1250 called SMN1 for mouse which matches the line called **your sequence
1251 from blat search**. This means we are fairly confidant we found the
1252 right location in the mouse genome.
1253
1254 .. image:: images/ucsc_gb_smn1_human_blat_to_browser.png
1255    :alt: Genome Browser - SMN1 (human) - Blat to browser
1256    :align: center
1257
1258 Follow steps 1 through 3 for mouse and then repeat step 4 with the
1259 human protein sequence to find **SMN1** in the following species (if
1260 you find a match):
1261
1262  1. Rat
1263  2. Rabbit
1264  3. Dog
1265  4. Armadillo
1266  5. Elephant
1267  6. Opposum
1268  7. x_tropicalis
1269
1270 Make sure to save the extended DNA sequence and annotation file for
1271 each one.
1272
1273
1274 Step 5 - Create Analysis
1275 ~~~~~~~~~~~~~~~~~~~~~~~~
1276
1277 At this point you should have the annotations and fasta files for each
1278 species. If you skipped the first four steps or are having trouble,
1279 you can download the example data from the `Mussa Example Data
1280 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/ExampleData>`_ page.
1281
1282 There are two methods for creating an analysis. You can create MUssa
1283 PArameter file (.mupa), or you can use the create analysis dialog. To
1284 use the analysis dialog, see the `create a new analysis`_ section.
1285
1286 If you are planning on do lots of analyses using the same sets of DNA
1287 sequence but with different parameters, annotations, and/or species,
1288 it is often best to setup a `mupa`_ file, so you can:
1289
1290   * Change parameters and rerun analysis easily.
1291   * Use Mussa command line option to run a batch analyses.
1292   * Define an analysis for someone else to run.
1293
1294 Now, we will create a `mupa`_ file for smn1 for an analysis with
1295 Human, Mouse, and Cow. I'll start by showing you the `mupa`_ file and
1296 then walking you through it line by line.
1297
1298 Start by creating a new text file called *smn1_human_mouse_cow.mupa*,
1299 in your smn1 directory. I decided to put each of the fasta and
1300 annotation files for each species in it's own directory, so I will use
1301 that setup (see screen shot below).
1302
1303 .. image:: images/smn1_dir_structure.png
1304    :alt: SMN1 directory structure
1305    :align: center
1306
1307 smn1_human_mouse_cow.mupa:
1308 ::
1309
1310   # Analysis name
1311   ANA_NAME smn1_human_mouse_cow
1312
1313   # Appending to analysis name
1314   APPEND_WIN true
1315   APPEND_THRES true
1316
1317   # Human sequence
1318   SEQUENCE human/smn1_human_dna.fasta
1319   ANNOTATION human/smn1_human_annotations.txt
1320
1321   SEQUENCE mouse/smn1_mouse_dna.fasta
1322   ANNOTATION mouse/smn1_mouse_annotations.txt
1323
1324   SEQUENCE cow/smn1_cow_dna.fasta
1325   ANNOTATION cow/smn1_cow_annotations.txt
1326
1327   # Window size / Threshold
1328   WINDOW 30
1329   THRESHOLD 24
1330
1331 The first line is the analysis name. This will be the name of the
1332 directory the results will be saved in when using the Mussa `command
1333 line`_ option --no-gui to run an analysis. If you are using the Mussa
1334 GUI, then you will be prompted for a directory name as mentioned in
1335 the `saving`_ section.
1336
1337 ::
1338
1339   # Analysis name
1340   ANA_NAME smn1_human_mouse_cow
1341
1342 If your provide the APPEND_WIN and/or APPEND_THRES, and set them to
1343 true, the window size and threshold will be appended to the analysis
1344 name. In this example, using the --no-gui `command line`_ option, our
1345 directory name would be *smn1_human_mouse_cow_w30_t24*.
1346
1347 ::
1348
1349   # Appending to analysis name
1350   APPEND_WIN true
1351   APPEND_THRES true
1352
1353 The following six lines provide Mussa with the location of the
1354 sequence files and annotation files. The files can provided with
1355 relative paths from the .mupa file. In other words, this .mupa file
1356 will provide the proper path to the human sequence only if there
1357 exists a directory called *human* in the same directory as this .mupa
1358 file.
1359
1360 To provide the species name for each species, you have to put the
1361 species name in the annotation files. See the `annotation file
1362 format`_ section for more details.
1363
1364 ::
1365
1366   # Human sequence
1367   SEQUENCE human/smn1_human_dna.fasta
1368   ANNOTATION human/smn1_human_annotations.txt
1369
1370   SEQUENCE mouse/smn1_mouse_dna.fasta
1371   ANNOTATION mouse/smn1_mouse_annotations.txt
1372
1373   SEQUENCE cow/smn1_cow_dna.fasta
1374   ANNOTATION cow/smn1_cow_annotations.txt
1375
1376 And finally, the `window size`_ and `threshold`_ parameters.
1377
1378 ::
1379
1380   # Window size / Threshold
1381   WINDOW 30
1382   THRESHOLD 24
1383
1384 Next, open Mussagl and select the **File > Create Analysis from File**
1385 menu option. Mussagl should run your analysis if everything was setup
1386 properly.
1387
1388
1389
1390 Understanding Mussa
1391 ===================
1392
1393 Command Line
1394 ------------
1395
1396 Mussa has some very useful command line options that allow for
1397 loading an existing analysis or running a new analysis with or without
1398 launching the GUI.
1399
1400 Mussa options:
1401   --help                     help message
1402   -p, --run-analysis arg     run an analysis defined by the mussa parameter file
1403   --view-analysis arg        load a previously run analysis
1404   --motifs arg               annotate analysis with motifs from this file
1405   --no-gui                   terminate without running an analysis
1406   --python                   launch as a `python interpreter`_
1407
1408 Running an analysis using the --no-gui option is useful when you want
1409 to run many analyses on a compute server and save the results for
1410 viewing in the future.
1411
1412
1413 Performance
1414 -----------
1415
1416 Algorithm Behavior
1417 ~~~~~~~~~~~~~~~~~~
1418
1419 FIXME: Include seqcomp algorithm info.
1420
1421 FIXME: Include transitivity info.
1422
1423 Repeats
1424 ~~~~~~~
1425
1426 Repeat masking of all input sequences, or at least of the "reference"
1427 genome, can be important for reducing compute time and for simplifying
1428 subsequent visual interpretation. Larger loci generally contain more
1429 repeat elements, and as their number grows so will the number of Mussa
1430 connections among them. If not repeat filtered, connectivity between
1431 shared repeat elements can obscure important relationships between
1432 single copy features.
1433
1434 The formula for the number of connections, C, that will be made for R
1435 instances of a single repeat (meaning R copies of one repeat in each
1436 sequence) and S sequences is:
1437
1438 C = (R^2)[S(S-1)/2]
1439
1440 Table of example situations:
1441
1442 =====  =====  =====
1443   C      R      S
1444 =====  =====  =====
1445    16     4     2
1446    48     4     3
1447    96     4     4
1448   160     4     5
1449   240     4     6
1450   336     4     7
1451   448     4     8
1452    24     2     4
1453    54     3     4
1454    96     4     4
1455   150     5     4
1456   216     6     4
1457   294     7     4
1458   384     8     4
1459  2500    50     2
1460  7500    50     3
1461 15000    50     4
1462 10000   100     2
1463 30000   100     3
1464 60000   100     4
1465 =====  =====  =====
1466
1467 After the connections, C, are found, they are passed on to the
1468 transitivity filter, which is a C^2 algorithm (FIXME: confirm
1469 algorithm is C^2). This means with 50 repeats in 2 sequences giving
1470 you a C of 2500, ends up with a C^2 of 6,250,000.
1471
1472 **Conclusion: repeats cause the processing time of Mussa to skyrocket.**
1473
1474 To deal with a situation where you have many repeats in your sequences
1475 do any of the following:
1476
1477  * Use shorter sequence lengths.
1478  * Repeat mask one or more of your sequences.
1479  * Increase the threshold.
1480
1481
1482 Details
1483 -------
1484
1485 Case: Conservation track suddenly stops
1486 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1487
1488 Details about this potentially confusing case can be found `here
1489 <http://woldlab.caltech.edu/cgi-bin/mussa/wiki/OverlappingWindows>`_.
1490
1491 Python Interpreter
1492 ------------------
1493
1494 Mussagl has some functionality for running a python interpreter for
1495 interacting with the internals of Mussagl and/or executing Python
1496 code. This feature is mostly experimental at this point in time. If
1497 you have interest in this feature or would like to know more about it,
1498 contact us using the contact information found at
1499 http://mussa.caltech.edu/.
1500
1501 .. Define links below
1502    ------------------
1503
1504 .. _GPL: http://www.opensource.org/licenses/gpl-license.php
1505 .. _wiki: http://mussa.caltech.edu
1506 .. _build: http://woldlab.caltech.edu/cgi-bin/mussa/wiki/MussaglBuild
1507 .. _FASTA: http://en.wikipedia.org/wiki/fasta_format
1508 .. _wpDnaMotif: http://en.wikipedia.org/wiki/DNA_motif
1509 .. _mupa: `Parameter File Format`_