Brandon W. King
---------------
-Last updated: Oct 18th, 2006
+Last updated: Oct 20th, 2006
Updated to Mussagl build: (In process to 424)
Repeats
~~~~~~~
-The algorithm Mussa uses to find conserved sequences is sensative to
-repeated DNA segments, which are naturally apart of many genomes. The
-problem with repeats, is that one repeat from one sequence can show up
-many times in another sequence. Every connection Mussa makes takes up
-memory, and it also takes time to store and process the results.
+The algorithm Mussa uses to find conserved sequences is sensitive to
+repeated DNA segments, which are frequently occurring in most
+genomes. The problem with repeats, is that one repeat from one
+sequence can show up many times in another sequence. Every connection
+Mussa makes takes up memory and CPU time to process.
The formula for the number of connections, C, that will be made for R
instances of a single repeat (meaning R copies of one repeat in each
**Conclusion: repeats cause the processing time of Mussa to skyrocket.**
-One, way to deal with a situation where you have lots of repeats in
-your sequences is to use shorter sequences lengths and/or repeat mask
-at least one of your sequences.
+One way to deal with a situation where you have many repeats in your
+sequences is do any of the following: user shorter sequence lengths;
+repeat mask one or more of your sequences; or increase the threshold.
Details
-------