We're trying to determine if there are repeats near genomic probems.
The first pass file had a large number of repeats in it.
high_genomic_signal_fewer_repeats.txt had those stripped out.
I created a fasta file with the following python code
data = [ x.strip().split('\t') for x in open('high_genomic_signal_fewer_repeats.txt','r')] open('high_genomic_signal_fewer_repeats.fa','w').writelines([ ">%s|%s\n%s\n" % (x[2],x[0],x[3]) for x in data])ok yeah I shouldn't do that in so few lines.
Hopefully the following
blastall -p blastn -e 0.000001 -d mouse_34.0.fa -i high_genomic_signal_fewer_repeats.fa -m 7 | tee high_genomic_signal_fewer_repeats.blast.xmlcommand will generate a new, smaller xml results file.