getblast

Retrieve BLAST report from NCBI website

Description

example

blastdata = getblast(RID) retrieves RID, the Request ID for the NCBI BLAST report, and returns the report data in blastdata, a MATLAB® structure. The Request ID, RID, must be recent because NCBI purges reports after 36 hours.

blastdata = getblast(RID,Name,Value) uses additional options specified by one or more name-value pair arguments.

Examples

collapse all

Perform a BLAST search on a protein sequence and save the results to an XML file.

Get a sequence from the Protein Data Bank and create a MATLAB structure.

S = getpdb('1CIV');

Use the structure as input for the BLAST search with a significance threshold of 1e-10. The first output is the request ID, and the second output is the estimated time (in minutes) until the search is completed.

[RID1,ROTE] = blastncbi(S,'blastp','expect',1e-10);

Get the search results from the report. You can save the XML-formatted report to a file for an offline access. Use ROTE as the wait time to retrieve the results.

report1 = getblast(RID1,'WaitTime',ROTE,'ToFile','1CIV_report.xml')
Blast results are not available yet. Please wait ...

report1 = 

  struct with fields:

                RID: 'R49TJMCF014'
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'Query_224139'
    QueryDefinition: 'unnamed protein product'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Use blastread to read BLAST data from the XML-formatted BLAST report file.

blastdata = blastread('1CIV_report.xml')
blastdata = 

  struct with fields:

                RID: ''
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'Query_224139'
    QueryDefinition: 'unnamed protein product'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Alternatively, run the BLAST search with an NCBI accession number.

RID2 = blastncbi('AAA59174','blastp','expect',1e-10)
RID2 =

    'R49WAPMH014'

Get the search results from the report.

report2 = getblast(RID2)
Blast results are not available yet. Please wait ...

report2 = 

  struct with fields:

                RID: 'R49WAPMH014'
          Algorithm: 'BLASTP 2.6.1+'
           Database: 'nr'
            QueryID: 'AAA59174.1'
    QueryDefinition: 'insulin receptor precursor [Homo sapiens]'
               Hits: [1×100 struct]
         Parameters: [1×1 struct]
         Statistics: [1×1 struct]

Input Arguments

collapse all

Request ID for retrieving results from a specific NCBI BLAST search, specified as a character vector or string.

Example: 'GTF033EZ015'

Name-Value Pair Arguments

Specify optional comma-separated pairs of Name,Value arguments. Name is the argument name and Value is the corresponding value. Name must appear inside quotes. You can specify several name and value pair arguments in any order as Name1,Value1,...,NameN,ValueN.

Example: 'ToFile','report.xml' saves the results to a file named report.xml.

Name of the file to save the report data to, specified as the comma-separated pair consisting of 'ToFile' and a character vector or string. The file is XML-formatted by default.

Example: 'ToFile','Report.xml'

Time (in minutes) to wait for the report from NCBI to be ready, specified as the comma-separated pair consisting of 'WaitTime' and a nonnegative integer. If the report is still not ready after the specified time, an error is generated.

The default value is 0, that is, there is no delay in retrieving the report.

Tip

Use the RTOE, request time of execution, returned by the blastncbi function as the wait time here.

Example: 'WaitTime',2

Connection timeout (in seconds) for each request, specified as a positive scalar. For details, see here.

Example: 'TimeOut',10

Output Arguments

collapse all

BLAST report data, returned as a structure that contains the following fields:

FieldDescription
RIDRequest ID for retrieving results from a specific NCBI BLAST search
AlgorithmNCBI algorithm used to perform the BLAST search
DatabaseAll databases searched
QueryIDIdentifier of the query sequence
QueryDefinitionDefinition of the query sequence
HitsStructure containing information on the hit sequences, such as IDs, accession numbers, lengths, and HSPs (high-scoring segment pairs)
ParametersStructure containing information on the input parameters used to perform the search
StatisticsSummary of statistical details about the performed search, such as lambda, kappa, and entropy values

More About

collapse all

Hits

This table lists each field of blastdata.Hits.

FieldDescription
IDID of the subject sequence that matched the query sequence
DefinitionDescription of the subject sequence
AccessionAccession of the subject sequence
LengthLength of the subject sequence
HspsStructure containing Information on the high-scoring segment pairs (HSPs)

Hits.Hsps

This table summarizes the fields of Hits.Hsps.

FieldDescription
ScorePairwise alignment score for a high-scoring segment pair between the query sequence and a subject sequence.
BitScoreBit score for a high-scoring segment pair.
ExpectExpectation value for a high-scoring segment pair.
IdentitiesNumber of identical or similar residues for a high-scoring segment pair between the query sequence and a subject sequence.
PositivesNumber of identical or similar residues for a high-scoring sequence pair between the query sequence and a subject amino acid sequence. This field applies only to translated nucleotide or amino acid query sequences and databases.
GapsNonaligned residues for a high-scoring segment pair.
AlignmentLengthLength of the alignment for a high-scoring segment pair.
QueryIndicesIndices of the query sequence residue positions for a high-scoring segment pair.
SubjectIndicesIndices of the subject sequence residue positions for a high-scoring segment pair.
FrameReading frame of the translated nucleotide sequence for a high-scoring segment pair.
Alignment3-by-N character array showing the alignment for a high-scoring sequence pair between the query sequence and a subject sequence. The first row is the query sequence, the second row is the alignment, and the third row is the subject sequence.

Compatibility Considerations

expand all

Errors starting in R2017b

Errors starting in R2017b

Errors starting in R2017b

Introduced before R2006a