From: Diane Trout Date: Mon, 10 Jul 2006 22:32:49 +0000 (+0000) Subject: Presentation for bioinformatics club. X-Git-Url: http://woldlab.caltech.edu/gitweb/?p=mussa.git;a=commitdiff_plain;h=3e4ed2d24297ff900124d2bde4bfea713bc515a4 Presentation for bioinformatics club. --- diff --git a/doc/bioinfo_jc/4way_trans.png b/doc/bioinfo_jc/4way_trans.png new file mode 100644 index 0000000..04c7ee7 Binary files /dev/null and b/doc/bioinfo_jc/4way_trans.png differ diff --git a/doc/bioinfo_jc/bioinfo-presentation.rst b/doc/bioinfo_jc/bioinfo-presentation.rst index 3437935..79fba63 100644 --- a/doc/bioinfo_jc/bioinfo-presentation.rst +++ b/doc/bioinfo_jc/bioinfo-presentation.rst @@ -41,16 +41,63 @@ Family Tree .. _`Family Relations`: http://cartwheel.caltech.edu/ +Motivation +---------- + +.. class:: small + + The hope is that conservation while highlight elements that are important. + However, it (by definition) only shows elements in common. + + For instance though a two sequence comparision between a Human and Fugu + muscle gene might show important elements of muscle, it would lose any + mammal specific elements. + + But a two sequence comparison between Mouse and Human might have too + much in common to be useful. + + +Motivation: Human vs. Fugu +-------------------------- + +.. class:: small + + .. image:: HuFu.png + +Motivation: Human vs. Mouse +--------------------------- + +.. class:: small + + .. image:: HuMo.png + +Motivation +---------- + +.. class:: small + + The hope is that by requiring conservation in multiple more closely related + species one can achive the purification of the long distance comparison + while still allowing elements that are important to those more closely + related species to remain. + +Motivation: Mammals +------------------- + +.. class:: small + + .. image:: HuCoDoMoRa.png + Algorithm --------- .. class:: small - To compute a result Mussa conceptually uses these modules - - * Seqcomp - * Test Transitivity - * "Refinement" + To compute a result Mussa uses these algorithms to perform the N-way + filtering. + + * Seqcomp (determins the pairwise list of "matches") + * Transitivity Test (filters the matches) Seqcomp ------- @@ -67,11 +114,11 @@ Seqcomp match = 0 for i in range(W): if S[0][x+i] == S[1][y+i]: - increment match - if match > threshold: - save indicies + match = match + 1 + if match >= threshold: + save_indicies(x,y) - The actual algorithm only needs to compare the base that + The algorithm actully being used only needs to compare the base that "slid in" into window, and account for the base that "slid out" Seqcomp @@ -83,7 +130,7 @@ Seqcomp .. image:: 4bp_window_no_match.png - In this case there are none. + In this case there is only one. Seqcomp ------- @@ -94,7 +141,7 @@ Seqcomp .. image:: 4bp_window_match.png - However, now that we slid over one position there are now 4 + However, now that we slid over one position there are now 3 and so we would record 0, 5 Seqcomp @@ -110,28 +157,174 @@ Seqcomp When extending to more than two sequences, mussa needs to compare - (N * (N-1)) / 2 sequences + (N * (N-1)) sequences + +Transitivity Test +----------------- + +.. class:: small + + There are several algorithms for comparing multiple sequences. + + * Require transitivity, e.g. if A = B, and B = C, then A = C + * "Radial" only tests matches between any number of query sequences + and a single reference sequence. A = B, A = C, but B ?= C + * "Entropy" (an experimental comparision that Tristan was working on) Test Transitivity ----------------- -Refinement ----------- +.. class:: small + .. image:: 4way_trans.png + + Limits ------ - describe the difference between a long distance comparison - and multiple closer comparisons. (should use some pictures for that) +.. class:: small - paircomp/seqcomp + One of the weaknesses with the current implementation is that the + transitivity filtering step involves a combinatorial explosion as it + compares every possible path. - transitivity filter + The parameters that influence the number of matches found are, + repeat masking the sequence, how closely releated the two sequences + are, the length of the sequence and the stringency of the seqcomp + threshold. -How To Use ----------- +Limits +------ - Should this include pulling things from the tutorial? - cover sucking things out of UCSC? +.. class:: small + + Additionally the types of elements found are influenced by the + window size and base-pair threshold. + + For instance a 6 base pair binding site wont be detected when using + a 30 base pair window size. + +Usage +----- + +.. class:: small + + Currently I have two classes of target user for mussa. + + * Computationally savvy user (AKA me) + * The "typical" biologist (AKA my PI) + +Tutorial +-------- + + Brandon has been working on a tutorial for the GUI + which includes a section on how we extract sequence out of UCSC. + + +Command-Line Features +--------------------- + +.. class:: small + + * Command line:: + + $ mussagl --help + --run-analysis arg run an analysis + defined by the mussa + parameter file + --view-analysis arg load a previously run + analysis + --no-gui terminate without viewing + an analysis + +Command-Line Features +--------------------- + +.. class:: small + + * Parameter file:: + + ANA_NAME mck3test + APPEND_WIN true + APPEND_THRES true + + SEQUENCE seq/mouse_mck_pro.fa + ANNOTATION mm_mck3test.annot + +Command-Line Features +--------------------- + +.. class:: small + + * Annotation File:: + + [Seq name] + start stop name type + >name + AGCGAAA + + * [Seq name] is an optional name specifier. + * The "alignment" algorithm used for sequence specified annotations + is currently just using the motif search, so it only accepts + IUPAC codes and doesn't handle in-dels. + +GUI Features +------------ + +.. class:: small + + * The Create Analysis menu option provides the same options + as the parameter file. + + .. image:: ../manual/images/define_analysis.png + +GUI Features +------------ + +.. class:: small + + Although there isn't a GUI for describing large annotations. + (The motif editor can be used this way but there are issues). + + +GUI Features +------------ + +.. class:: small + + The Mussa GUI can: + + * Display sequence with highlighted annotation regions + * Search for motifs in these sequences + * Show a base-pair alignment of a seqcomp "match" + * Copy sequence regions + * Create a new analysis using a subselection of one analysis + and different parameters. + +GUI +--- + +.. class:: small + + + +Finish +------ + +.. class:: small + +Mussa has been developed by: + + * Tristan DeBuysscher + * Diane Trout + * Brandon King + * Nora Mullaney +And been influenced by: + + * C. Titus Brown + * Erich Schwars + * and Barbara Wold + :tiny:`and as I stepped in fairly late in Mussa's life, there could easily + be others.`