From: Brandon King Date: Wed, 18 Oct 2006 23:38:07 +0000 (+0000) Subject: Mussa Manual: Screenshots + Performance section X-Git-Url: http://woldlab.caltech.edu/gitweb/?p=mussa.git;a=commitdiff_plain;h=25dce31047c1b8f285bd7219a02551565adde731;ds=sidebyside Mussa Manual: Screenshots + Performance section * Updated screenshots in the "view alignment" & "sub-analysis" sections. * Removed reference to build numbers (since this is for release 1.0) * Added an Understand Mussa section. * Added placeholder for algorithm performance. * Added documentation on the effects of repeats in DNA sequence on Mussa. * Added a placeholder for Diane's overlapping windows "interesting" case. --- diff --git a/doc/manual/images/copy_sequence.png b/doc/manual/images/copy_sequence.png index 6784e53..27e428b 100644 Binary files a/doc/manual/images/copy_sequence.png and b/doc/manual/images/copy_sequence.png differ diff --git a/doc/manual/images/mussa_alignment.png b/doc/manual/images/mussa_alignment.png index 80ca850..6afcb57 100644 Binary files a/doc/manual/images/mussa_alignment.png and b/doc/manual/images/mussa_alignment.png differ diff --git a/doc/manual/images/select_sequence.png b/doc/manual/images/select_sequence.png index 53fcff0..233532e 100644 Binary files a/doc/manual/images/select_sequence.png and b/doc/manual/images/select_sequence.png differ diff --git a/doc/manual/images/subanalysis_dialog.png b/doc/manual/images/subanalysis_dialog.png index cacad04..cf70059 100644 Binary files a/doc/manual/images/subanalysis_dialog.png and b/doc/manual/images/subanalysis_dialog.png differ diff --git a/doc/manual/images/subanalysis_select_seqs.png b/doc/manual/images/subanalysis_select_seqs.png index 24a0b65..6fe242e 100644 Binary files a/doc/manual/images/subanalysis_select_seqs.png and b/doc/manual/images/subanalysis_select_seqs.png differ diff --git a/doc/manual/images/view_mussa_alignment.png b/doc/manual/images/view_mussa_alignment.png index ed7b638..011c53a 100644 Binary files a/doc/manual/images/view_mussa_alignment.png and b/doc/manual/images/view_mussa_alignment.png differ diff --git a/doc/manual/images/washed_out.png b/doc/manual/images/washed_out.png index babf319..bca9a66 100644 Binary files a/doc/manual/images/washed_out.png and b/doc/manual/images/washed_out.png differ diff --git a/doc/manual/mussagl_manual.rst b/doc/manual/mussagl_manual.rst index 45df62b..18a8a1a 100644 --- a/doc/manual/mussagl_manual.rst +++ b/doc/manual/mussagl_manual.rst @@ -808,8 +808,8 @@ A new Mussa window will pop up. :alt: New Mussa Window :align: center - A new Mussa window on the right, in which I have loaded a second - experiment. + A new Mussa window on the right, in which a second analysis has + been loaded. Now you can create or load an existing analysis, in this new window, as described in the `Create/Load Analysis`_ section. @@ -991,7 +991,7 @@ Sub-analysis To run a sub-analysis **highlight** a section of sequence and *right click* on it and select **Add to subanalysis**. To the same for the sequences shown in orange in the screenshot below. Note that you **are -NOT limited** to selecting more than one subsequence from the same +NOT limited** to selecting only one subsequence from the same sequence. .. image:: images/subanalysis_select_seqs.png @@ -1031,8 +1031,6 @@ as shown in the screen shot below, and do one of the following: Saving to an Image --------------------------------- - * Updated to build 419. - To save your current mussa view to an image, select **File > Save to image...** as shown below. @@ -1255,6 +1253,81 @@ N G or A or T or C aNy ====== ================= =================================== + +Understanding Mussa +=================== + + +Performance +----------- + +Algorithm Behavior +~~~~~~~~~~~~~~~~~~ + +FIXME: Include seqcomp algorithm info. + +FIXME: Include transitivity info. + +Repeats +~~~~~~~ + +The algorithm Mussa uses to find conserved sequences is sensative to +repeated DNA segments, which are naturally apart of many genomes. The +problem with repeats, is that one repeat from one sequence can show up +many times in another sequence. Every connection Mussa makes takes up +memory, and it also takes time to store and process the results. + +The formula for the number of connections, C, that will be made for R +instances of a single repeat (meaning R copies of one repeat in each +sequence) and S sequences is: + +C = (R^2)[S(S-1)/2] + +Table of example situations: + +===== ===== ===== + C R S +===== ===== ===== + 16 4 2 + 48 4 3 + 96 4 4 + 160 4 5 + 240 4 6 + 336 4 7 + 448 4 8 + 24 2 4 + 54 3 4 + 96 4 4 + 150 5 4 + 216 6 4 + 294 7 4 + 384 8 4 + 2500 50 2 + 7500 50 3 +15000 50 4 +10000 100 2 +30000 100 3 +60000 100 4 +===== ===== ===== + +After the connections, C, are found, they are passed on to the +transitivity filter, which is a C^2 algorithm (FIXME: confirm +algorithm is C^2). This means with 50 repeats in 2 sequences giving +you a C of 2500, ends up with a C^2 of 6,250,000. + +**Conclusion: repeats cause the processing time of Mussa to skyrocket.** + +One, way to deal with a situation where you have lots of repeats in +your sequences is to use shorter sequences lengths and/or repeat mask +at least one of your sequences. + +Details +------- + +Case: Conservation track suddenly stops +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + .. Define links below ------------------