Creating Custom Database using Standalone NCBI BLAST+
Basic Local Alignment Search Tool (BLAST) is a collection of programs developed using heuristic algorithm in C++ for comparing DNA, RNA, and protein sequences. The standalone command-line interface (CLI) of BLAST is named as BLAST+. The latest version of NCBI BLAST+ can be downloaded from the FTP server of NCBI (ftp://ftp.ncbi.nih.gov/blast/executables/blast+/LATEST). This is a simple tutorial for creating a custom database, accessing the database, and performing a sequence search using BLAST+.
1. Creating a Custom Database
A nucleotide (nucl) or protein (prot) database can be created using -dbtype parameter in makeblastdb program. We can create two types of database using command-line below,
Non-indexed Database: ./makeblastdb -in DBX.fasta -out DBX -dbtype prot
Building a new DB, current time: 12/04/2020 10:10:06 New DB name: C:\NCBI\blast-2.6.0+\bin\DBX New DB title: DBX.fasta Sequence type: Protein Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 20 sequences in 0.0041614 seconds.
Indexed Database: ./makeblastdb -in DB.fasta -out DB -dbtype prot -parse_seqids
Building a new DB, current time: 12/03/2020 17:29:38 New DB name: C:\NCBI\blast-2.11.0+\bin\DB New DB title: DB.fasta Sequence type: Protein Keep MBits: T Maximum file size: 1000000000B Adding sequences from FASTA; added 20 sequences in 0.0277056 seconds.
The sequence files DB.fasta and DBX.fasta were given at end of this page.
2. List of Records in the Database
List of entries in the database can be viewed using -entry all parameter in blastdbcmd program. The command-line to get the list of sequence identifiers assigned to the non-indexed and indexed database are below,
Non-indexed Database: ./blastdbcmd -db DBX -entry all -outfmt "OID: %o GI: %g ACC: %a IDENTIFIER: %i"
OID: 0 GI: N/A ACC: BL_ORD_ID:0 IDENTIFIER: gnl|BL_ORD_ID|0 OID: 1 GI: N/A ACC: BL_ORD_ID:1 IDENTIFIER: gnl|BL_ORD_ID|1 OID: 2 GI: N/A ACC: BL_ORD_ID:2 IDENTIFIER: gnl|BL_ORD_ID|2 OID: 3 GI: N/A ACC: BL_ORD_ID:3 IDENTIFIER: gnl|BL_ORD_ID|3 OID: 4 GI: N/A ACC: BL_ORD_ID:4 IDENTIFIER: gnl|BL_ORD_ID|4 OID: 5 GI: N/A ACC: BL_ORD_ID:5 IDENTIFIER: gnl|BL_ORD_ID|5 OID: 6 GI: N/A ACC: BL_ORD_ID:6 IDENTIFIER: gnl|BL_ORD_ID|6 OID: 7 GI: N/A ACC: BL_ORD_ID:7 IDENTIFIER: gnl|BL_ORD_ID|7 OID: 8 GI: N/A ACC: BL_ORD_ID:8 IDENTIFIER: gnl|BL_ORD_ID|8 OID: 9 GI: N/A ACC: BL_ORD_ID:9 IDENTIFIER: gnl|BL_ORD_ID|9 OID: 10 GI: N/A ACC: BL_ORD_ID:10 IDENTIFIER: gnl|BL_ORD_ID|10 OID: 11 GI: N/A ACC: BL_ORD_ID:11 IDENTIFIER: gnl|BL_ORD_ID|11 OID: 12 GI: N/A ACC: BL_ORD_ID:12 IDENTIFIER: gnl|BL_ORD_ID|12 OID: 13 GI: N/A ACC: BL_ORD_ID:13 IDENTIFIER: gnl|BL_ORD_ID|13 OID: 14 GI: N/A ACC: BL_ORD_ID:14 IDENTIFIER: gnl|BL_ORD_ID|14 OID: 15 GI: N/A ACC: BL_ORD_ID:15 IDENTIFIER: gnl|BL_ORD_ID|15 OID: 16 GI: N/A ACC: BL_ORD_ID:16 IDENTIFIER: gnl|BL_ORD_ID|16 OID: 17 GI: N/A ACC: BL_ORD_ID:17 IDENTIFIER: gnl|BL_ORD_ID|17 OID: 18 GI: N/A ACC: BL_ORD_ID:18 IDENTIFIER: gnl|BL_ORD_ID|18 OID: 19 GI: N/A ACC: BL_ORD_ID:19 IDENTIFIER: gnl|BL_ORD_ID|19
Indexed Database: ./blastdbcmd -db DB -entry all -outfmt "OID: %o GI: %g ACC: %a IDENTIFIER: %i"
OID: 0 GI: N/A ACC: Sequence1 IDENTIFIER: lcl|Sequence1 OID: 1 GI: N/A ACC: Sequence2 IDENTIFIER: lcl|Sequence2 OID: 2 GI: N/A ACC: Sequence3 IDENTIFIER: lcl|Sequence3 OID: 3 GI: N/A ACC: Sequence4 IDENTIFIER: lcl|Sequence4 OID: 4 GI: N/A ACC: Sequence5 IDENTIFIER: lcl|Sequence5 OID: 5 GI: N/A ACC: Sequence6 IDENTIFIER: lcl|Sequence6 OID: 6 GI: N/A ACC: Sequence7 IDENTIFIER: lcl|Sequence7 OID: 7 GI: N/A ACC: Sequence8 IDENTIFIER: lcl|Sequence8 OID: 8 GI: N/A ACC: Sequence9 IDENTIFIER: lcl|Sequence9 OID: 9 GI: N/A ACC: Sequence10 IDENTIFIER: lcl|Sequence10 OID: 10 GI: N/A ACC: Sequence11 IDENTIFIER: lcl|Sequence11 OID: 11 GI: N/A ACC: Sequence12 IDENTIFIER: lcl|Sequence12 OID: 12 GI: N/A ACC: Sequence13 IDENTIFIER: lcl|Sequence13 OID: 13 GI: N/A ACC: Sequence14 IDENTIFIER: lcl|Sequence14 OID: 14 GI: N/A ACC: Sequence15 IDENTIFIER: lcl|Sequence15 OID: 15 GI: N/A ACC: Sequence16 IDENTIFIER: lcl|Sequence16 OID: 16 GI: N/A ACC: Sequence17 IDENTIFIER: lcl|Sequence17 OID: 17 GI: N/A ACC: Sequence18 IDENTIFIER: lcl|Sequence18 OID: 18 GI: N/A ACC: Sequence19 IDENTIFIER: lcl|Sequence19 OID: 19 GI: N/A ACC: Sequence20 IDENTIFIER: lcl|Sequence20
The identifier BLAST Ordinal Identifiers (BL_ORD_ID) and General (GNL) represents non-indexed database and Local (LCL) represents indexed database.
3. Searching Sequence from the Database
Sequence in the database can be accessed through entry ID using -entry parameter in blastdbcmd program. The command-line to access entry from the non-indexed and indexed database are below,
Non-indexed Database: ./blastdbcmd -db DBX -entry 'gnl|BL_ORD_ID|1'
>AAA40590.1 insulin [Octodon degus] MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRPHDRRELEDLQVEQAELGLEAGGLQPSALE MILQKRGIVDQCCNNICTFNQLQNYCNVP
The latest BLAST+ does not permit access to first entry (index number ‘0’) in the non-indexed database; since the starting index number is ‘1’. Moreover, it does not recognize entries of a non-indexed database. I have used BLAST+ version 2.6.0 to construct a non-indexed database.
./blastdbcmd -db DBX -entry 'gnl|BL_ORD_ID|0' Error: [blastdbcmd] CObject_id::GetId(): Invalid choice selection: NCBI-General::Object-id.str
Indexed Database: ./blastdbcmd -db DB -entry Sequence1
Sequence1 NP_001191615.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRFLAKYMVKRDT ENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISRTGRSNSGHAQLEDNFS
4. Retrieving Sequence from the Database
Sequence from the database can be retrieved through entry ID using -entry parameter in blastdbcmd program. The command-line to retrieve the sequence to file from the non-indexed and indexed database are below,
Non-indexed Database: ./blastdbcmd -db DBX -entry 'gnl|BL_ORD_ID|1' -out Sequence2.fasta
Indexed Database: ./blastdbcmd -db DB -entry Sequence1 -out Sequence1.fasta
5. Performing Pairwise Alignment
Pairwise sequence alignment can be performed be passing query as a input file (in fasta file format) through parameters, or raw sequence (not supported in BLAST+ old versions) through command-line. The command-line to perform pairwise alignment are below,
./blastp -db DB -query sequence.fasta, OR
echo ALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE | ./blastp -db DB
The sequence alignment output is below,
BLASTP 2.11.0+
Reference: Stephen F. Altschul, Thomas L. Madden, Alejandro A.
Schaffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J.
Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of
protein database search programs", Nucleic Acids Res. 25:3389-3402.
Reference for composition-based statistics: Alejandro A. Schaffer,
L. Aravind, Thomas L. Madden, Sergei Shavirin, John L. Spouge, Yuri
I. Wolf, Eugene V. Koonin, and Stephen F. Altschul (2001),
"Improving the accuracy of PSI-BLAST protein database searches with
composition-based statistics and other refinements", Nucleic Acids
Res. 29:2994-3005.
Database: DB.fasta
20 sequences; 2,178 total letters
Query=
Length=59
Score E
Sequences producing significant alignments: (Bits) Value
Sequence3 KAB1251309.1 Insulin [Camelus dromedarius] 120 3e-41
Sequence16 AAA59172.1 insulin [Homo sapiens] 91.3 6e-30
Sequence10 AAA19033.1 insulin [Oryctolagus cuniculus] 89.7 3e-29
Sequence4 NP_001035835.1 insulin, isoform 2 precursor [Homo sapiens] 85.1 2e-26
Sequence8 AAB60625.1 insulin [Ovis aries] 82.0 2e-26
Sequence20 ELK28555.1 Insulin [Myotis davidii] 77.0 1e-23
Sequence7 pir||INHY insulin - hamster 65.5 3e-20
Sequence13 pir||INEL insulin - elephant 64.7 6e-20
Sequence15 pir||INTK insulin - turkey (tentative sequence) 63.5 1e-19
Sequence12 pir||INOS insulin - ostrich 63.5 1e-19
Sequence11 pir||INMKSQ insulin - common squirrel monkey 60.1 3e-18
Sequence2 AAA40590.1 insulin [Octodon degus] 60.8 6e-18
Sequence6 pir||INCD insulin - cod (Gadus sp.) 55.8 2e-16
Sequence5 NP_571131.1 insulin preproprotein [Danio rerio] 53.9 3e-15
Sequence9 XP_014388588.1 PREDICTED: insulin [Myotis brandtii] 48.9 1e-12
Sequence19 BAS32722.1 insulin, partial [Varanus exanthematicus] 38.5 2e-09
Sequence17 QBX89050.1 insulin, partial [Nephrops norvegicus] 19.6 0.042
>Sequence3 KAB1251309.1 Insulin [Camelus dromedarius]
Length=110
Score = 120 bits (300), Expect = 3e-41, Method: Compositional matrix adjust.
Identities = 59/59 (100%), Positives = 59/59 (100%), Gaps = 0/59 (0%)
Query 1 ALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59
ALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE
Sbjct 9 ALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 67
>Sequence16 AAA59172.1 insulin [Homo sapiens]
Length=110
Score = 91.3 bits (225), Expect = 6e-30, Method: Compositional matrix adjust.
Identities = 42/50 (84%), Positives = 42/50 (84%), Gaps = 0/50 (0%)
Query 10 APTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59
P PA AF NQHLCGSHLVEALYLVCGERGFFYTPK RRE ED QVG VE
Sbjct 18 GPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVE 67
>Sequence10 AAA19033.1 insulin [Oryctolagus cuniculus]
Length=110
Score = 89.7 bits (221), Expect = 3e-29, Method: Compositional matrix adjust.
Identities = 40/47 (85%), Positives = 43/47 (91%), Gaps = 0/47 (0%)
Query 13 PARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59
PA+AF NQHLCGSHLVEALYLVCGERGFFYTPK+RREVE+ QVG E
Sbjct 21 PAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKSRREVEELQVGQAE 67
>Sequence4 NP_001035835.1 insulin, isoform 2 precursor [Homo sapiens]
Length=200
Score = 85.1 bits (209), Expect = 2e-26, Method: Compositional matrix adjust.
Identities = 38/44 (86%), Positives = 38/44 (86%), Gaps = 0/44 (0%)
Query 11 PTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQ 54
P PA AF NQHLCGSHLVEALYLVCGERGFFYTPK RRE ED Q
Sbjct 19 PDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQ 62
>Sequence8 AAB60625.1 insulin [Ovis aries]
Length=105
Score = 82.0 bits (201), Expect = 2e-26, Method: Compositional matrix adjust.
Identities = 38/43 (88%), Positives = 39/43 (91%), Gaps = 0/43 (0%)
Query 16 AFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGV 58
AF NQHLCGSHLVEALYLVCGERGFFYTPKARREVE QVG +
Sbjct 24 AFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGAL 66
>Sequence20 ELK28555.1 Insulin [Myotis davidii]
Length=168
Score = 77.0 bits (188), Expect = 1e-23, Method: Compositional matrix adjust.
Identities = 34/40 (85%), Positives = 36/40 (90%), Gaps = 0/40 (0%)
Query 15 RAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQ 54
+AF NQHLCGSHLVEALYLVCGERGFFYTPK RRE+ D Q
Sbjct 23 QAFVNQHLCGSHLVEALYLVCGERGFFYTPKDRRELPDPQ 62
Score = 44.3 bits (103), Expect = 5e-11, Method: Compositional matrix adjust.
Identities = 24/52 (46%), Positives = 32/52 (62%), Gaps = 2/52 (4%)
Query 8 LGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59
L + P+++ +QHLCG LV AL + CG+RG FY P A E +D Q VE
Sbjct 80 LASVDPSQS-QDQHLCGDELVNALTITCGDRG-FYNPMAPLEQDDLQEEEVE 129
>Sequence7 pir||INHY insulin - hamster
Length=51
Score = 65.5 bits (158), Expect = 3e-20, Method: Compositional matrix adjust.
Identities = 28/30 (93%), Positives = 29/30 (97%), Gaps = 0/30 (0%)
Query 17 FANQHLCGSHLVEALYLVCGERGFFYTPKA 46
F NQHLCGSHLVEALYLVCGERGFFYTPK+
Sbjct 1 FVNQHLCGSHLVEALYLVCGERGFFYTPKS 30
>Sequence13 pir||INEL insulin - elephant
Length=51
Score = 64.7 bits (156), Expect = 6e-20, Method: Compositional matrix adjust.
Identities = 28/30 (93%), Positives = 28/30 (93%), Gaps = 0/30 (0%)
Query 17 FANQHLCGSHLVEALYLVCGERGFFYTPKA 46
F NQHLCGSHLVEALYLVCGERGFFYTPK
Sbjct 1 FVNQHLCGSHLVEALYLVCGERGFFYTPKT 30
>Sequence15 pir||INTK insulin - turkey (tentative sequence)
Length=51
Score = 63.5 bits (153), Expect = 1e-19, Method: Compositional matrix adjust.
Identities = 28/29 (97%), Positives = 29/29 (100%), Gaps = 0/29 (0%)
Query 18 ANQHLCGSHLVEALYLVCGERGFFYTPKA 46
ANQHLCGSHLVEALYLVCGERGFFY+PKA
Sbjct 2 ANQHLCGSHLVEALYLVCGERGFFYSPKA 30
>Sequence12 pir||INOS insulin - ostrich
Length=51
Score = 63.5 bits (153), Expect = 1e-19, Method: Compositional matrix adjust.
Identities = 28/29 (97%), Positives = 29/29 (100%), Gaps = 0/29 (0%)
Query 18 ANQHLCGSHLVEALYLVCGERGFFYTPKA 46
ANQHLCGSHLVEALYLVCGERGFFY+PKA
Sbjct 2 ANQHLCGSHLVEALYLVCGERGFFYSPKA 30
>Sequence11 pir||INMKSQ insulin - common squirrel monkey
Length=51
Score = 60.1 bits (144), Expect = 3e-18, Method: Compositional matrix adjust.
Identities = 26/30 (87%), Positives = 26/30 (87%), Gaps = 0/30 (0%)
Query 17 FANQHLCGSHLVEALYLVCGERGFFYTPKA 46
F NQHLCG HLVEALYLVCGERGFFY PK
Sbjct 1 FVNQHLCGPHLVEALYLVCGERGFFYAPKT 30
>Sequence2 AAA40590.1 insulin [Octodon degus]
Length=109
Score = 60.8 bits (146), Expect = 6e-18, Method: Compositional matrix adjust.
Identities = 28/49 (57%), Positives = 35/49 (71%), Gaps = 1/49 (2%)
Query 11 PTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59
P +A+++QHLCGS+LVEALY+ CG G FY P RRE+ED QV E
Sbjct 19 PNSVQAYSSQHLCGSNLVEALYMTCGRSG-FYRPHDRRELEDLQVEQAE 66
>Sequence6 pir||INCD insulin - cod (Gadus sp.)
Length=51
Score = 55.8 bits (133), Expect = 2e-16, Method: Compositional matrix adjust.
Identities = 23/27 (85%), Positives = 25/27 (93%), Gaps = 0/27 (0%)
Query 20 QHLCGSHLVEALYLVCGERGFFYTPKA 46
QHLCGSHLV+ALYLVCG+RGFFY PK
Sbjct 5 QHLCGSHLVDALYLVCGDRGFFYNPKG 31
>Sequence5 NP_571131.1 insulin preproprotein [Danio rerio]
Length=108
Score = 53.9 bits (128), Expect = 3e-15, Method: Compositional matrix adjust.
Identities = 25/32 (78%), Positives = 27/32 (84%), Gaps = 2/32 (6%)
Query 20 QHLCGSHLVEALYLVCGERGFFYTPKARREVE 51
QHLCGSHLV+ALYLVCG GFFY PK R+VE
Sbjct 27 QHLCGSHLVDALYLVCGPTGFFYNPK--RDVE 56
>Sequence9 XP_014388588.1 PREDICTED: insulin [Myotis brandtii]
Length=183
Score = 48.9 bits (115), Expect = 1e-12, Method: Compositional matrix adjust.
Identities = 26/51 (51%), Positives = 34/51 (67%), Gaps = 1/51 (2%)
Query 9 GAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59
APTPA+AF +HLC L E L ++CG++G F PKA RE+ D Q G V+
Sbjct 17 WAPTPAQAFYFEHLCDEDLAEMLTIICGDQG-FRNPKATRELPDPQEGEVD 66
Score = 46.6 bits (109), Expect = 7e-12, Method: Compositional matrix adjust.
Identities = 22/40 (55%), Positives = 28/40 (70%), Gaps = 1/40 (3%)
Query 20 QHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVE 59
Q LCG LV+ L +VCG+RG FY+P A RE+ D Q G V+
Sbjct 106 QRLCGEDLVDTLTMVCGDRG-FYSPTALRELPDPQEGEVD 144
>Sequence19 BAS32722.1 insulin, partial [Varanus exanthematicus]
Length=88
Score = 38.5 bits (88), Expect = 2e-09, Method: Compositional matrix adjust.
Identities = 22/49 (45%), Positives = 29/49 (59%), Gaps = 2/49 (4%)
Query 3 LALLALGAPTPARAFA--NQHLCGSHLVEALYLVCGERGFFYTPKARRE 49
L LLA+ APT A + ++HLCGS LVEAL CG+ G + K +
Sbjct 1 LVLLAVLAPTAIYATSENDEHLCGSALVEALVSACGKEGIYSFTKRNEQ 49
>Sequence17 QBX89050.1 insulin, partial [Nephrops norvegicus]
Length=178
Score = 19.6 bits (39), Expect = 0.042, Method: Compositional matrix adjust.
Identities = 9/25 (36%), Positives = 12/25 (48%), Gaps = 2/25 (8%)
Query 20 QHLCGSHLVEALYLVCGERGFFYTP 44
+ LCG L L VC +G + P
Sbjct 25 RRLCGWRLANKLNRVC--KGVYNNP 47
Score = 13.1 bits (22), Expect = 9.6, Method: Compositional matrix adjust.
Identities = 6/19 (32%), Positives = 7/19 (37%), Gaps = 0/19 (0%)
Query 32 YLVCGERGFFYTPKARREV 50
YL +R TP E
Sbjct 90 YLTFSQRASEDTPSEENEA 108
Lambda K H a alpha
0.324 0.139 0.423 0.792 4.96
Gapped
Lambda K H a alpha sigma
0.267 0.0410 0.140 1.90 42.6 43.6
Effective search space used: 57052
Database: DB.fasta
Posted date: Dec 3, 2020 5:29 PM
Number of letters in database: 2,178
Number of sequences in database: 20
Matrix: BLOSUM62
Gap Penalties: Existence: 11, Extension: 1
Neighboring words threshold: 11
Window for multiple hits: 40
6. Storing Pairwise Alignment Result
The output of pairwise alignment can be stored to the local disk using command-line below,
./blastp -db DB -query sequence.fasta -outfmt 0 -out output.html -html
The list of sequence alignment output formats (-outfmt) are:
0 = Pairwise,
1 = Query-anchored showing identities,
2 = Query-anchored no identities,
3 = Flat query-anchored showing identities,
4 = Flat query-anchored no identities,
5 = BLAST XML,
6 = Tabular,
7 = Tabular with comment lines,
8 = Seqalign (Text ASN.1),
9 = Seqalign (Binary ASN.1),
10 = Comma-separated values,
11 = BLAST archive (ASN.1),
12 = Seqalign (JSON),
13 = Multiple-file BLAST JSON,
14 = Multiple-file BLAST XML2,
15 = Single-file BLAST JSON,
16 = Single-file BLAST XML2,
17 = Sequence Alignment/Map (SAM), and
18 = Organism Report
Sequence Files used for Database Creation
The FASTA file formatted multiple sequence file (DB.fasta) is given below:
>Sequence1 NP_001191615.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRF LAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISR TGRSNSGHAQLEDNFS >Sequence2 AAA40590.1 insulin [Octodon degus] MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRPHDRRELEDLQVEQAELGLE AGGLQPSALEMILQKRGIVDQCCNNICTFNQLQNYCNVP >Sequence3 KAB1251309.1 Insulin [Camelus dromedarius] MALWTRLLALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVELGG GPGAGGLQPLGPEGRPQKRGIVEQCCASVCSLYQLENYCN >Sequence4 NP_001035835.1 insulin, isoform 2 precursor [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQASALSLSS STSTWPEGLDATARAPPALVVTANIGQAGGSSSRQFRQRALGTSDSPVLFIHCPGAAGTAQGLEYRGRRV TTELVWEEVDSSPQPQGSESLPAQPPAQPAPQPEPQQAREPSPEVSCCGLWPRRPQRSQN >Sequence5 NP_571131.1 insulin preproprotein [Danio rerio] MAVWLQAGALLVLLVVSSVSTNPGTPQHLCGSHLVDALYLVCGPTGFFYNPKRDVEPLLGFLPPKSAQET EVADFAFKDHAELIRKRGIVEQCCHKPCSIFELQNYCN >Sequence6 pir||INCD insulin - cod (Gadus sp.) MAPPQHLCGSHLVDALYLVCGDRGFFYNPKGIVDQCCHRPCDIFDLQNYCN >Sequence7 pir||INHY insulin - hamster FVNQHLCGSHLVEALYLVCGERGFFYTPKSGIVDQCCTSICSLYQLENYCN >Sequence8 AAB60625.1 insulin [Ovis aries] MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGALELAG GPGAGGLEGPPQKRGIVEQCCAGVCSLYQLENYCN >Sequence9 XP_014388588.1 PREDICTED: insulin [Myotis brandtii] MALWTRLLPLLALLALWAPTPAQAFYFEHLCDEDLAEMLTIICGDQGFRNPKATRELPDPQEGEVDMGAG GQKALTLEQLLQNSDIPARLLALWAPAPAPAQSGEQRLCGEDLVDTLTMVCGDRGFYSPTALRELPDPQE GEVDMGAGGQKALTLEQLLQNSDIVDMCCNNFCSFYQLEYYCN >Sequence10 AAA19033.1 insulin [Oryctolagus cuniculus] MASLAALLPLLALLVLCRLDPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKSRREVEELQVGQAELGG GPGAGGLQPSALELALQKRGIVEQCCTSICSLYQLENYCN >Sequence11 pir||INMKSQ insulin - common squirrel monkey FVNQHLCGPHLVEALYLVCGERGFFYAPKTGVVDQCCTSICSLYQLQNYCN >Sequence12 pir||INOS insulin - ostrich AANQHLCGSHLVEALYLVCGERGFFYSPKAGIVEQCCHNTCSLYQLENYCN >Sequence13 pir||INEL insulin - elephant FVNQHLCGSHLVEALYLVCGERGFFYTPKTGIVEQCCTGVCSLYQLENYCN >Sequence14 AAF80383.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRF LAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISR TGRSNSGHAQLEDNFS >Sequence15 pir||INTK insulin - turkey (tentative sequence) AANQHLCGSHLVEALYLVCGERGFFYSPKAGIVEQCCHNTCSLYQLENYCN >Sequence16 AAA59172.1 insulin [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG GPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN >Sequence17 QBX89050.1 insulin, partial [Nephrops norvegicus] VVVVVVGSSRASRRTYPTSEEEPRRRLCGWRLANKLNRVCKGVYNNPGSTGNYLFYRSRRDGESEPGLPP EEYLDLLPDPEEERGLRHHYLTFSQRASEDTPSEENEAPGSFFGSLSPQDSPHQSAVQEDEASSVQFPFL TEEEASQMVRVRPRSKRGLSAECCRKVCTVSELVGYCY >Sequence18 ACQ91106.1 insulin, partial [Haliotis corrugata] DLHVIISNLCSSLGGNRRFLAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRI FELAQYCRLPDHFFSRISRTG >Sequence19 BAS32722.1 insulin, partial [Varanus exanthematicus] LVLLAVLAPTAIYATSENDEHLCGSALVEALVSACGKEGIYSFTKRNEQSLGHGLLDNEVPFHLGKRGIV EDCCENICPWSVLQSYCR >Sequence20 ELK28555.1 Insulin [Myotis davidii] MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKDRRELPDPQGESSPLTP RSHPKGTGYLASVDPSQSQDQHLCGDELVNALTITCGDRGFYNPMAPLEQDDLQEEEVEMDEGGLQALTL EGLLQKRGIVEECCTNVCSLYQLERYCNThe FASTA file formatted multiple sequence file (DBX.fasta) is given below:
>NP_001191615.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRF LAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISR TGRSNSGHAQLEDNFS >AAA40590.1 insulin [Octodon degus] MAPWMHLLTVLALLALWGPNSVQAYSSQHLCGSNLVEALYMTCGRSGFYRPHDRRELEDLQVEQAELGLE AGGLQPSALEMILQKRGIVDQCCNNICTFNQLQNYCNVP >KAB1251309.1 Insulin [Camelus dromedarius] MALWTRLLALLALLALGAPTPARAFANQHLCGSHLVEALYLVCGERGFFYTPKARREVEDTQVGGVELGG GPGAGGLQPLGPEGRPQKRGIVEQCCASVCSLYQLENYCN >NP_001035835.1 insulin, isoform 2 precursor [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQASALSLSS STSTWPEGLDATARAPPALVVTANIGQAGGSSSRQFRQRALGTSDSPVLFIHCPGAAGTAQGLEYRGRRV TTELVWEEVDSSPQPQGSESLPAQPPAQPAPQPEPQQAREPSPEVSCCGLWPRRPQRSQN >NP_571131.1 insulin preproprotein [Danio rerio] MAVWLQAGALLVLLVVSSVSTNPGTPQHLCGSHLVDALYLVCGPTGFFYNPKRDVEPLLGFLPPKSAQET EVADFAFKDHAELIRKRGIVEQCCHKPCSIFELQNYCN >pir||INCD insulin - cod (Gadus sp.) MAPPQHLCGSHLVDALYLVCGDRGFFYNPKGIVDQCCHRPCDIFDLQNYCN >pir||INHY insulin - hamster FVNQHLCGSHLVEALYLVCGERGFFYTPKSGIVDQCCTSICSLYQLENYCN >AAB60625.1 insulin [Ovis aries] MALWTRLVPLLALLALWAPAPAHAFVNQHLCGSHLVEALYLVCGERGFFYTPKARREVEGPQVGALELAG GPGAGGLEGPPQKRGIVEQCCAGVCSLYQLENYCN >XP_014388588.1 PREDICTED: insulin [Myotis brandtii] MALWTRLLPLLALLALWAPTPAQAFYFEHLCDEDLAEMLTIICGDQGFRNPKATRELPDPQEGEVDMGAG GQKALTLEQLLQNSDIPARLLALWAPAPAPAQSGEQRLCGEDLVDTLTMVCGDRGFYSPTALRELPDPQE GEVDMGAGGQKALTLEQLLQNSDIVDMCCNNFCSFYQLEYYCN >AAA19033.1 insulin [Oryctolagus cuniculus] MASLAALLPLLALLVLCRLDPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKSRREVEELQVGQAELGG GPGAGGLQPSALELALQKRGIVEQCCTSICSLYQLENYCN >pir||INMKSQ insulin - common squirrel monkey FVNQHLCGPHLVEALYLVCGERGFFYAPKTGVVDQCCTSICSLYQLQNYCN >pir||INOS insulin - ostrich AANQHLCGSHLVEALYLVCGERGFFYSPKAGIVEQCCHNTCSLYQLENYCN >pir||INEL insulin - elephant FVNQHLCGSHLVEALYLVCGERGFFYTPKTGIVEQCCTGVCSLYQLENYCN >AAF80383.1 insulin precursor [Aplysia californica] MSKFLLQSHSANACLLTLLLTLASNLDISLANFEHSCNGYMRPHPRGLCGEDLHVIISNLCSSLGGNRRF LAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRIFELAQYCRLPDHFFSRISR TGRSNSGHAQLEDNFS >pir||INTK insulin - turkey (tentative sequence) AANQHLCGSHLVEALYLVCGERGFFYSPKAGIVEQCCHNTCSLYQLENYCN >AAA59172.1 insulin [Homo sapiens] MALWMRLLPLLALLALWGPDPAAAFVNQHLCGSHLVEALYLVCGERGFFYTPKTRREAEDLQVGQVELGG GPGAGSLQPLALEGSLQKRGIVEQCCTSICSLYQLENYCN >QBX89050.1 insulin, partial [Nephrops norvegicus] VVVVVVGSSRASRRTYPTSEEEPRRRLCGWRLANKLNRVCKGVYNNPGSTGNYLFYRSRRDGESEPGLPP EEYLDLLPDPEEERGLRHHYLTFSQRASEDTPSEENEAPGSFFGSLSPQDSPHQSAVQEDEASSVQFPFL TEEEASQMVRVRPRSKRGLSAECCRKVCTVSELVGYCY >ACQ91106.1 insulin, partial [Haliotis corrugata] DLHVIISNLCSSLGGNRRFLAKYMVKRDTENVNDKLRGILLNKKEAFSYLTKREASGSITCECCFNQCRI FELAQYCRLPDHFFSRISRTG >BAS32722.1 insulin, partial [Varanus exanthematicus] LVLLAVLAPTAIYATSENDEHLCGSALVEALVSACGKEGIYSFTKRNEQSLGHGLLDNEVPFHLGKRGIV EDCCENICPWSVLQSYCR >ELK28555.1 Insulin [Myotis davidii] MALWTRLLPLLALLALWAPAPAQAFVNQHLCGSHLVEALYLVCGERGFFYTPKDRRELPDPQGESSPLTP RSHPKGTGYLASVDPSQSQDQHLCGDELVNALTITCGDRGFYNPMAPLEQDDLQEEEVEMDEGGLQALTL EGLLQKRGIVEECCTNVCSLYQLERYCN
NICE
ReplyDelete