references/services_reference.md

BioServices: Complete Services Reference

This document provides a comprehensive reference for all major services available in BioServices, including key methods, parameters, and use cases.

Protein & Gene Resources

UniProt

Protein sequence and functional information database.

Initialization:

from bioservices import UniProt
u = UniProt(verbose=False)

Key Methods:

  • search(query, frmt="tab", columns=None, limit=None, sort=None, compress=False, include=False, **kwargs)
  • Search UniProt with flexible query syntax
  • frmt: "tab", "fasta", "xml", "rdf", "gff", "txt"
  • columns: Comma-separated list (e.g., "id,genes,organism,length")
  • Returns: String in requested format

  • retrieve(uniprot_id, frmt="txt")

  • Retrieve specific UniProt entry
  • frmt: "txt", "fasta", "xml", "rdf", "gff"
  • Returns: Entry data in requested format

  • mapping(fr="UniProtKB_AC-ID", to="KEGG", query="P43403")

  • Convert identifiers between databases
  • fr/to: Database identifiers (see identifier_mapping.md)
  • query: Single ID or comma-separated list
  • Returns: Dictionary mapping input to output IDs

  • searchUniProtId(pattern, columns="entry name,length,organism", limit=100)

  • Convenience method for ID-based searches
  • Returns: Tab-separated values

Common columns: id, entry name, genes, organism, protein names, length, sequence, go-id, ec, pathway, interactor

Use cases: - Protein sequence retrieval for BLAST - Functional annotation lookup - Cross-database identifier mapping - Batch protein information retrieval


KEGG (Kyoto Encyclopedia of Genes and Genomes)

Metabolic pathways, genes, and organisms database.

Initialization:

from bioservices import KEGG
k = KEGG()
k.organism = "hsa"  # Set default organism

Key Methods:

  • list(database)
  • List entries in KEGG database
  • database: "organism", "pathway", "module", "disease", "drug", "compound"
  • Returns: Multi-line string with entries

  • find(database, query)

  • Search database by keywords
  • Returns: List of matching entries with IDs

  • get(entry_id)

  • Retrieve entry by ID
  • Supports genes, pathways, compounds, etc.
  • Returns: Raw entry text

  • parse(data)

  • Parse KEGG entry into dictionary
  • Returns: Dict with structured data

  • lookfor_organism(name)

  • Search organisms by name pattern
  • Returns: List of matching organism codes

  • lookfor_pathway(name)

  • Search pathways by name
  • Returns: List of pathway IDs

  • get_pathway_by_gene(gene_id, organism)

  • Find pathways containing gene
  • Returns: List of pathway IDs

  • parse_kgml_pathway(pathway_id)

  • Parse pathway KGML for interactions
  • Returns: Dict with "entries" and "relations"

  • pathway2sif(pathway_id)

  • Extract Simple Interaction Format data
  • Filters for activation/inhibition
  • Returns: List of interaction tuples

Organism codes: - hsa: Homo sapiens - mmu: Mus musculus - dme: Drosophila melanogaster - sce: Saccharomyces cerevisiae - eco: Escherichia coli

Use cases: - Pathway analysis and visualization - Gene function annotation - Metabolic network reconstruction - Protein-protein interaction extraction


HGNC (Human Gene Nomenclature Committee)

Official human gene naming authority.

Initialization:

from bioservices import HGNC
h = HGNC()

Key Methods: - search(query): Search gene symbols/names - fetch(format, query): Retrieve gene information

Use cases: - Standardizing human gene names - Looking up official gene symbols


MyGeneInfo

Gene annotation and query service.

Initialization:

from bioservices import MyGeneInfo
m = MyGeneInfo()

Key Methods: - querymany(ids, scopes, fields, species): Batch gene queries - getgene(geneid): Get gene annotation

Use cases: - Batch gene annotation retrieval - Gene ID conversion


Chemical Compound Resources

ChEBI (Chemical Entities of Biological Interest)

Dictionary of molecular entities.

Initialization:

from bioservices import ChEBI
c = ChEBI()

Key Methods: - getCompleteEntity(chebi_id): Full compound information - getLiteEntity(chebi_id): Basic information - getCompleteEntityByList(chebi_ids): Batch retrieval

Use cases: - Small molecule information - Chemical structure data - Compound property lookup


ChEMBL

Bioactive drug-like compound database.

Initialization:

from bioservices import ChEMBL
c = ChEMBL()

Key Methods: - get_compound_by_chemblId(chembl_id): Compound details - get_target_by_chemblId(chembl_id): Target information - get_assays(): Bioassay data

Use cases: - Drug discovery data - Bioactivity information - Target-compound relationships


UniChem

Chemical identifier mapping service.

Initialization:

from bioservices import UniChem
u = UniChem()

Key Methods: - get_compound_id_from_kegg(kegg_id): KEGG → ChEMBL - get_all_compound_ids(src_compound_id, src_id): Get all IDs - get_src_compound_ids(src_compound_id, from_src_id, to_src_id): Convert IDs

Source IDs: - 1: ChEMBL - 2: DrugBank - 3: PDB - 6: KEGG - 7: ChEBI - 22: PubChem

Use cases: - Cross-database compound ID mapping - Linking chemical databases


PubChem

Chemical compound database from NIH.

Initialization:

from bioservices import PubChem
p = PubChem()

Key Methods: - get_compounds(identifier, namespace): Retrieve compounds - get_properties(properties, identifier, namespace): Get properties

Use cases: - Chemical structure retrieval - Compound property information


Sequence Analysis Tools

NCBIblast

Sequence similarity searching.

Initialization:

from bioservices import NCBIblast
s = NCBIblast(verbose=False)

Key Methods: - run(program, sequence, stype, database, email, **params) - Submit BLAST job - program: "blastp", "blastn", "blastx", "tblastn", "tblastx" - stype: "protein" or "dna" - database: "uniprotkb", "pdb", "refseq_protein", etc. - email: Required by NCBI - Returns: Job ID

  • getStatus(jobid)
  • Check job status
  • Returns: "RUNNING", "FINISHED", "ERROR"

  • getResult(jobid, result_type)

  • Retrieve results
  • result_type: "out" (default), "ids", "xml"

Important: BLAST jobs are asynchronous. Always check status before retrieving results.

Use cases: - Protein homology searches - Sequence similarity analysis - Functional annotation by homology


Pathway & Interaction Resources

Reactome

Pathway database.

Initialization:

from bioservices import Reactome
r = Reactome()

Key Methods: - get_pathway_by_id(pathway_id): Pathway details - search_pathway(query): Search pathways

Use cases: - Human pathway analysis - Biological process annotation


PSICQUIC

Protein interaction query service (federates 30+ databases).

Initialization:

from bioservices import PSICQUIC
s = PSICQUIC()

Key Methods: - query(database, query_string) - Query specific interaction database - Returns: PSI-MI TAB format

  • activeDBs
  • Property listing available databases
  • Returns: List of database names

Available databases: MINT, IntAct, BioGRID, DIP, InnateDB, MatrixDB, MPIDB, UniProt, and 30+ more

Query syntax: Supports AND, OR, species filters - Example: "ZAP70 AND species:9606"

Use cases: - Protein-protein interaction discovery - Network analysis - Interactome mapping


IntactComplex

Protein complex database.

Initialization:

from bioservices import IntactComplex
i = IntactComplex()

Key Methods: - search(query): Search complexes - details(complex_ac): Complex details

Use cases: - Protein complex composition - Multi-protein assembly analysis


OmniPath

Integrated signaling pathway database.

Initialization:

from bioservices import OmniPath
o = OmniPath()

Key Methods: - interactions(datasets, organisms): Get interactions - ptms(datasets, organisms): Post-translational modifications

Use cases: - Cell signaling analysis - Regulatory network mapping


Gene Ontology

QuickGO

Gene Ontology annotation service.

Initialization:

from bioservices import QuickGO
g = QuickGO()

Key Methods: - Term(go_id, frmt="obo") - Retrieve GO term information - Returns: Term definition and metadata

  • Annotation(protein=None, goid=None, format="tsv")
  • Get GO annotations
  • Returns: Annotations in requested format

GO categories: - Biological Process (BP) - Molecular Function (MF) - Cellular Component (CC)

Use cases: - Functional annotation - Enrichment analysis - GO term lookup


Genomic Resources

BioMart

Data mining tool for genomic data.

Initialization:

from bioservices import BioMart
b = BioMart()

Key Methods: - datasets(dataset): List available datasets - attributes(dataset): List attributes - query(query_xml): Execute BioMart query

Use cases: - Bulk genomic data retrieval - Custom genome annotations - SNP information


ArrayExpress

Gene expression database.

Initialization:

from bioservices import ArrayExpress
a = ArrayExpress()

Key Methods: - queryExperiments(keywords): Search experiments - retrieveExperiment(accession): Get experiment data

Use cases: - Gene expression data - Microarray analysis - RNA-seq data retrieval


ENA (European Nucleotide Archive)

Nucleotide sequence database.

Initialization:

from bioservices import ENA
e = ENA()

Key Methods: - search_data(query): Search sequences - retrieve_data(accession): Retrieve sequences

Use cases: - Nucleotide sequence retrieval - Genome assembly access


Structural Biology

PDB (Protein Data Bank)

3D protein structure database.

Initialization:

from bioservices import PDB
p = PDB()

Key Methods: - get_file(pdb_id, file_format): Download structure files - search(query): Search structures

File formats: pdb, cif, xml

Use cases: - 3D structure retrieval - Structure-based analysis - PyMOL visualization


Pfam

Protein family database.

Initialization:

from bioservices import Pfam
p = Pfam()

Key Methods: - searchSequence(sequence): Find domains in sequence - getPfamEntry(pfam_id): Domain information

Use cases: - Protein domain identification - Family classification - Functional motif discovery


Specialized Resources

BioModels

Systems biology model repository.

Initialization:

from bioservices import BioModels
b = BioModels()

Key Methods: - get_model_by_id(model_id): Retrieve SBML model

Use cases: - Systems biology modeling - SBML model retrieval


COG (Clusters of Orthologous Genes)

Orthologous gene classification.

Initialization:

from bioservices import COG
c = COG()

Use cases: - Orthology analysis - Functional classification


BiGG Models

Metabolic network models.

Initialization:

from bioservices import BiGG
b = BiGG()

Key Methods: - list_models(): Available models - get_model(model_id): Model details

Use cases: - Metabolic network analysis - Flux balance analysis


General Patterns

Error Handling

All services may throw exceptions. Wrap calls in try-except:

try:
    result = service.method(params)
    if result:
        # Process result
        pass
except Exception as e:
    print(f"Error: {e}")

Verbosity Control

Most services support verbose parameter:

service = Service(verbose=False)  # Suppress HTTP logs

Rate Limiting

Services have timeouts and rate limits:

service.TIMEOUT = 30  # Adjust timeout
service.DELAY = 1     # Delay between requests (if supported)

Output Formats

Common format parameters: - frmt: "xml", "json", "tab", "txt", "fasta" - format: Service-specific variants

Caching

Some services cache results:

service.CACHE = True  # Enable caching
service.clear_cache()  # Clear cache

Additional Resources

For detailed API documentation: - Official docs: https://bioservices.readthedocs.io/ - Individual service docs linked from main page - Source code: https://github.com/cokelaer/bioservices

← Back to bioservices