Course Material
>NP_416557.1 GDP-mannose 4,6-dehydratase [Escherichia coli str. K-12 substr. MG1655]
MSKVALITGVTGQDGSYLAEFLLEKGYEVHGIKRRASSFNTERVDHIYQDPHTCNPKFHLHYGDLSDTSN
LTRILREVQPDEVYNLGAMSHVAVSFESPEYTADVDAMGTLRLLEAIRFLGLEKKTRFYQASTSELYGLV
QEIPQKETTPFYPRSPYAVAKLYAYWITVNYRESYGMYACNGILFNHESPRRGETFVTRKITRAIANIAQ
GLESCLYLGNMDSLRDWGHAKDYVKMQWMMLQQEQPEDFVIATGVQYSVRQFVEMAAAQLGIKLRFEGTG
VEEKGIVVSVTGHDAPGVKPGDVIIAVDPRYFRPAEVETLLGDPTKAHEKLGWKPEITLREMVSEMVAND
LEAAKKHSLLKSHGYDVAIALES
Questions:
The input for your program should be the scores for match and mismatches, and 2 sequences. Allow for an option to calculate the scores for each combination of sequences and their reverse complements ie:
The input for your program shoud be this file
The output should include: Hit Name, Percent Query Coverage (alignment length/Query Length), Bit Score, Alignment Length, Query Length
Query: @
1>>>sp|P45796.1|XYND_PAEPO RecName: Full=Arabinoxylan arabinofuranohydrolase; Short=AXH; AltName: Full=AXH-m2,3; Short=AXH-m23; AltName: Full=Alpha-L-arabinofuranosidase; Short=AF; Flags: Precursor - 635 aa
Library: UniProtKB/Swiss-Prot
200544181 residues in 558590 sequences
>>SP:XYNA2_CLOSR P33558 Endo-1,4-beta-xylanase A
OS=Clostridium stercorarium OX=1510 GN=xynA PE=1 SV=2 (512 aa)
s-w opt: 230 Z-score: 447.3 bits: 92.7 E(558590): 2.3e-17
Smith-Waterman score: 230; 32.9% identity (61.4% similar) in 249 aa overlap (398-618:267-497)
Here is an example: SP:XYNA2_CLOSR 39% 92.7 249 635
The objective of this exercise is to practice regular expressions using the original FASTA program output.
Parse this file and create output with the following columns