name	blast
description	Use BLAST (Basic Local Alignment Search Tool) for sequence similarity searches against databases.

BLAST

Name: blast
Author: benchflow-ai

Use BLAST for finding regions of similarity between biological sequences. This skill covers running BLAST searches and parsing results.

Running BLAST Locally

from Bio.Blast.Applications import NcbiblastpCommandline, NcbiblastnCommandline

# Protein BLAST (blastp)
blastp_cline = NcbiblastpCommandline(
    query="query.fasta",
    db="nr",
    evalue=0.001,
    outfmt=5,  # XML format
    out="blast_results.xml"
)
stdout, stderr = blastp_cline()

# Nucleotide BLAST (blastn)
blastn_cline = NcbiblastnCommandline(
    query="query.fasta",
    db="nt",
    evalue=0.001,
    outfmt=5,
    out="blastn_results.xml"
)
stdout, stderr = blastn_cline()

NCBI BLAST Web Service

from Bio.Blast import NCBIWWW, NCBIXML

# Run BLAST search online
sequence = "MKTAYIAKQRQISFVKSHFSRQLEERLGLIEVQAPILSRVGDGTQDNLSGAEKAVQVKVKALPDAQFEVVHSLAKWKRQQIAAALEHHHHHH"

result_handle = NCBIWWW.qblast(
    "blastp",           # Program
    "nr",               # Database
    sequence,           # Query sequence
    hitlist_size=50,    # Max hits
    expect=0.001        # E-value threshold
)

# Save results
with open("blast_results.xml", "w") as f:
    f.write(result_handle.read())

Parsing BLAST Results

from Bio.Blast import NCBIXML

# Parse XML results
with open("blast_results.xml") as result_handle:
    blast_records = NCBIXML.parse(result_handle)

    for blast_record in blast_records:
        print(f"Query: {blast_record.query}")
        print(f"Database: {blast_record.database}")

        for alignment in blast_record.alignments[:10]:
            print(f"\n  Hit: {alignment.hit_def[:60]}")
            print(f"  Accession: {alignment.accession}")

            for hsp in alignment.hsps:
                print(f"    E-value: {hsp.expect}")
                print(f"    Score: {hsp.score}")
                print(f"    Identities: {hsp.identities}/{hsp.align_length}")
                print(f"    Query: {hsp.query[:50]}...")
                print(f"    Match: {hsp.match[:50]}...")
                print(f"    Subject: {hsp.sbjct[:50]}...")

Tabular Output Parsing

import pandas as pd

# Run BLAST with tabular output (format 6)
blastp_cline = NcbiblastpCommandline(
    query="query.fasta",
    db="nr",
    evalue=0.001,
    outfmt="6 qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore",
    out="blast_results.tsv"
)
stdout, stderr = blastp_cline()

# Parse tabular results
columns = ['qseqid', 'sseqid', 'pident', 'length', 'mismatch',
           'gapopen', 'qstart', 'qend', 'sstart', 'send', 'evalue', 'bitscore']
df = pd.read_csv("blast_results.tsv", sep='\t', names=columns)

# Filter high-quality hits
significant_hits = df[(df['evalue'] < 1e-10) & (df['pident'] > 50)]
print(significant_hits)

Creating Local BLAST Database

# Create protein database
makeblastdb -in proteins.fasta -dbtype prot -out my_protein_db

# Create nucleotide database
makeblastdb -in sequences.fasta -dbtype nucl -out my_nucl_db

from Bio.Blast.Applications import NcbimakeblastdbCommandline

# Create database programmatically
makedb_cline = NcbimakeblastdbCommandline(
    input_file="proteins.fasta",
    dbtype="prot",
    out="my_protein_db"
)
stdout, stderr = makedb_cline()

BLAST Program Selection

Program	Query	Database	Use Case
blastp	Protein	Protein	Protein similarity
blastn	Nucleotide	Nucleotide	DNA/RNA similarity
blastx	Nucleotide	Protein	Translated query
tblastn	Protein	Nucleotide	Search translated DB
tblastx	Nucleotide	Nucleotide	Both translated

Common Parameters

evalue: Expectation value threshold (default: 10)
word_size: Word size for initial match
gapopen: Gap opening penalty
gapextend: Gap extension penalty
matrix: Scoring matrix (BLOSUM62, PAM250, etc.)
num_threads: Number of CPU threads

blast

Install Skill

SKILL.md

BLAST

Running BLAST Locally

NCBI BLAST Web Service

Parsing BLAST Results

Tabular Output Parsing

Creating Local BLAST Database

BLAST Program Selection

Common Parameters