STRING Database API Reference
Overview
STRING (Search Tool for the Retrieval of Interacting Genes/Proteins) is a comprehensive database of known and predicted protein-protein interactions integrating data from over 40 sources.
Database Statistics (v12.0+): - Coverage: 5000+ genomes - Proteins: ~59.3 million - Interactions: 20+ billion - Data types: Physical interactions, functional associations, co-expression, co-occurrence, text-mining, databases
Core Data Resource: Designated by Global Biodata Coalition and ELIXIR
API Base URLs
- Current version: https://string-db.org/api
- Version-specific: https://version-12-0.string-db.org/api (for reproducibility)
- API documentation: https://string-db.org/help/api/
Best Practices
- Identifier Mapping: Always map identifiers first using
get_string_idsfor faster subsequent queries - Use STRING IDs: Prefer STRING identifiers (e.g.,
9606.ENSP00000269305) over gene names - Specify Species: For networks with >10 proteins, always specify species NCBI taxon ID
- Rate Limiting: Wait 1 second between API calls to avoid server overload
- Versioned URLs: Use version-specific URLs for reproducible research
- POST over GET: Use POST requests for large protein lists
- Caller Identity: Include
caller_identityparameter for tracking (e.g., your application name)
API Methods
1. Identifier Mapping (get_string_ids)
Purpose: Maps common protein names, gene symbols, UniProt IDs, and other identifiers to STRING identifiers.
Endpoint: /api/tsv/get_string_ids
Parameters:
- identifiers (required): Protein names/IDs separated by newlines (%0d)
- species (required): NCBI taxon ID
- limit: Number of matches per identifier (default: 1)
- echo_query: Include query term in output (1 or 0)
- caller_identity: Application identifier
Output Format: TSV with columns:
- queryItem: Original query
- queryIndex: Query position
- stringId: STRING identifier
- ncbiTaxonId: Species taxon ID
- taxonName: Species name
- preferredName: Preferred gene name
- annotation: Protein description
Example:
identifiers=TP53%0dBRCA1&species=9606&limit=1
Use cases: - Converting gene symbols to STRING IDs - Validating protein identifiers - Finding canonical protein names
2. Network Data (network)
Purpose: Retrieves protein-protein interaction network data in tabular format.
Endpoint: /api/tsv/network
Parameters:
- identifiers (required): Protein IDs separated by %0d
- species: NCBI taxon ID
- required_score: Confidence threshold 0-1000 (default: 400)
- 150: low confidence
- 400: medium confidence
- 700: high confidence
- 900: highest confidence
- network_type: functional (default) or physical
- add_nodes: Add N interacting proteins (0-10)
- caller_identity: Application identifier
Output Format: TSV with columns:
- stringId_A, stringId_B: Interacting proteins
- preferredName_A, preferredName_B: Gene names
- ncbiTaxonId: Species
- score: Combined interaction score (0-1000)
- nscore: Neighborhood score
- fscore: Fusion score
- pscore: Phylogenetic profile score
- ascore: Coexpression score
- escore: Experimental score
- dscore: Database score
- tscore: Text-mining score
Network Types: - Functional: All interaction evidence types (recommended for most analyses) - Physical: Only direct physical binding evidence
Example:
identifiers=9606.ENSP00000269305%0d9606.ENSP00000275493&required_score=700
3. Network Image (image/network)
Purpose: Generates visual network representation as PNG image.
Endpoint: /api/image/network
Parameters:
- identifiers (required): Protein IDs separated by %0d
- species: NCBI taxon ID
- required_score: Confidence threshold 0-1000
- network_flavor: Visualization style
- evidence: Show evidence types as colored lines
- confidence: Show confidence as line thickness
- actions: Show activating/inhibiting interactions
- add_nodes: Add N interacting proteins (0-10)
- caller_identity: Application identifier
Output: PNG image (binary data)
Image Specifications:
- Format: PNG
- Size: Automatically scaled based on network size
- High-resolution option available (add ?highres=1)
Example:
identifiers=TP53%0dMDM2&species=9606&network_flavor=evidence
4. Interaction Partners (interaction_partners)
Purpose: Retrieves all STRING interaction partners for given protein(s).
Endpoint: /api/tsv/interaction_partners
Parameters:
- identifiers (required): Protein IDs
- species: NCBI taxon ID
- required_score: Confidence threshold 0-1000
- limit: Maximum number of partners (default: 10)
- caller_identity: Application identifier
Output Format: TSV with same columns as network method
Use cases: - Finding hub proteins - Expanding networks - Discovery of novel interactions
Example:
identifiers=TP53&species=9606&limit=20&required_score=700
5. Functional Enrichment (enrichment)
Purpose: Performs functional enrichment analysis for a set of proteins across multiple annotation databases.
Endpoint: /api/tsv/enrichment
Parameters:
- identifiers (required): List of protein IDs
- species (required): NCBI taxon ID
- caller_identity: Application identifier
Enrichment Categories: - Gene Ontology: Biological Process, Molecular Function, Cellular Component - KEGG Pathways: Metabolic and signaling pathways - Pfam: Protein domains - InterPro: Protein families and domains - SMART: Domain architecture - UniProt Keywords: Curated functional keywords
Output Format: TSV with columns:
- category: Annotation category
- term: Term ID
- description: Term description
- number_of_genes: Genes in input with this term
- number_of_genes_in_background: Total genes with this term
- ncbiTaxonId: Species
- inputGenes: Comma-separated gene list
- preferredNames: Comma-separated gene names
- p_value: Enrichment p-value (uncorrected)
- fdr: False discovery rate (corrected p-value)
Statistical Method: Fisher's exact test with Benjamini-Hochberg FDR correction
Example:
identifiers=TP53%0dMDM2%0dATM%0dCHEK2&species=9606
6. PPI Enrichment (ppi_enrichment)
Purpose: Tests if a network has significantly more interactions than expected by chance.
Endpoint: /api/json/ppi_enrichment
Parameters:
- identifiers (required): List of protein IDs
- species: NCBI taxon ID
- required_score: Confidence threshold
- caller_identity: Application identifier
Output Format: JSON with fields:
- number_of_nodes: Proteins in network
- number_of_edges: Interactions observed
- expected_number_of_edges: Expected interactions (random)
- p_value: Statistical significance
Interpretation: - p-value < 0.05: Network is significantly enriched - Low p-value indicates proteins form functional module
Example:
identifiers=TP53%0dMDM2%0dATM%0dCHEK2&species=9606
7. Homology Scores (homology)
Purpose: Retrieves protein similarity/homology scores.
Endpoint: /api/tsv/homology
Parameters:
- identifiers (required): Protein IDs
- species: NCBI taxon ID
- caller_identity: Application identifier
Output Format: TSV with homology scores between proteins
Use cases: - Identifying protein families - Paralog analysis - Cross-species comparisons
8. Version Information (version)
Purpose: Returns current STRING database version.
Endpoint: /api/tsv/version
Output: Version string (e.g., "12.0")
Common Species NCBI Taxon IDs
| Organism | Common Name | Taxon ID |
|---|---|---|
| Homo sapiens | Human | 9606 |
| Mus musculus | Mouse | 10090 |
| Rattus norvegicus | Rat | 10116 |
| Drosophila melanogaster | Fruit fly | 7227 |
| Caenorhabditis elegans | C. elegans | 6239 |
| Saccharomyces cerevisiae | Yeast | 4932 |
| Arabidopsis thaliana | Thale cress | 3702 |
| Escherichia coli K-12 | E. coli | 511145 |
| Danio rerio | Zebrafish | 7955 |
| Gallus gallus | Chicken | 9031 |
Full list: https://string-db.org/cgi/input?input_page_active_form=organisms
STRING Identifier Format
STRING uses Ensembl protein IDs with taxon prefix:
- Format: {taxonId}.{ensemblProteinId}
- Example: 9606.ENSP00000269305 (human TP53)
ID Components: - Taxon ID: NCBI taxonomy identifier - Protein ID: Usually Ensembl protein ID (ENSP...)
Interaction Confidence Scores
STRING provides combined confidence scores (0-1000) based on multiple evidence channels:
Evidence Channels
- Neighborhood (nscore): Gene fusion and conserved genomic neighborhood
- Fusion (fscore): Gene fusion events across species
- Phylogenetic Profile (pscore): Co-occurrence across species
- Coexpression (ascore): RNA expression correlation
- Experimental (escore): Biochemical/genetic experiments
- Database (dscore): Curated pathway/complex databases
- Text-mining (tscore): Literature co-occurrence
Recommended Thresholds
- 150: Low confidence (exploratory analysis)
- 400: Medium confidence (standard analysis)
- 700: High confidence (conservative analysis)
- 900: Highest confidence (very stringent)
Output Formats
Available Formats
- TSV: Tab-separated values (default, best for data processing)
- JSON: JavaScript Object Notation (structured data)
- XML: Extensible Markup Language
- PSI-MI: Proteomics Standards Initiative format
- PSI-MITAB: Tab-delimited PSI-MI format
- PNG: Image format (for network visualizations)
- SVG: Scalable vector graphics (for network visualizations)
Format Selection
Replace /tsv/ in URL with desired format:
- /json/network - JSON format
- /xml/network - XML format
- /image/network - PNG image
Error Handling
HTTP Status Codes
- 200 OK: Successful request
- 400 Bad Request: Invalid parameters or syntax
- 404 Not Found: Protein/species not found
- 500 Internal Server Error: Server error
Common Errors
- "No proteins found": Invalid identifiers or species mismatch
- "Species required": Missing species parameter for large networks
- Empty results: No interactions above score threshold
- Timeout: Network too large, reduce protein count
Advanced Features
Bulk Network Upload
For complete proteome analysis: 1. Navigate to https://string-db.org/ 2. Select "Upload proteome" option 3. Upload FASTA file 4. STRING generates complete interaction network and predicts functions
Values/Ranks Enrichment API
For differential expression/proteomics data:
- Get API Key:
/api/json/get_api_key
-
Submit Data: Tab-separated protein ID and value pairs
-
Check Status:
/api/json/valuesranks_enrichment_status?job_id={id}
- Retrieve Results: Access enrichment tables and figures
Requirements: - Complete protein set (no filtering) - Numeric values for each protein - Proper species identifier
Network Customization
Network Size Control:
- add_nodes=N: Adds N most connected proteins
- limit: Controls partner retrieval
Confidence Filtering:
- Adjust required_score based on analysis goals
- Higher scores = fewer false positives, more false negatives
Network Type Selection:
- functional: All evidence (recommended for pathway analysis)
- physical: Direct binding only (recommended for structural studies)
Integration with Other Tools
Python Libraries
requests (recommended):
import requests
url = "https://string-db.org/api/tsv/network"
params = {"identifiers": "TP53", "species": 9606}
response = requests.get(url, params=params)
urllib (standard library):
import urllib.request
url = "https://string-db.org/api/tsv/network?identifiers=TP53&species=9606"
response = urllib.request.urlopen(url)
R Integration
STRINGdb Bioconductor package:
library(STRINGdb)
string_db <- STRINGdb$new(version="12", species=9606)
Cytoscape
STRING networks can be imported into Cytoscape for visualization and analysis: 1. Use stringApp plugin 2. Import TSV network data 3. Apply layouts and styling
Data License
STRING data is freely available under Creative Commons BY 4.0 license: - ✓ Free to use for academic and commercial purposes - ✓ Attribution required - ✓ Modifications allowed - ✓ Redistribution allowed
Citation: Szklarczyk et al. (latest publication)
Rate Limits and Usage
- Rate limiting: No strict limit, but avoid rapid-fire requests
- Recommendation: Wait 1 second between calls
- Large datasets: Use bulk download from https://string-db.org/cgi/download
- Proteome-scale: Use web upload feature instead of API
Related Resources
- STRING website: https://string-db.org
- Download page: https://string-db.org/cgi/download
- Help center: https://string-db.org/help/
- API documentation: https://string-db.org/help/api/
- Publications: https://string-db.org/cgi/about
Troubleshooting
No results returned: - Verify species parameter matches identifiers - Check identifier format - Lower confidence threshold - Use identifier mapping first
Timeout errors: - Reduce number of input proteins - Split large queries into batches - Use bulk download for proteome-scale analyses
Version inconsistencies:
- Use version-specific URLs
- Check STRING version with /version endpoint
- Update identifiers if using old IDs