Tool Discovery in ToolUniverse
Overview
ToolUniverse provides multiple methods to discover and search through 600+ scientific tools using natural language, keywords, or embeddings.
Discovery Methods
1. Tool_Finder (Embedding-Based Search)
Uses semantic embeddings to find relevant tools. Requires GPU for optimal performance.
from tooluniverse import ToolUniverse
tu = ToolUniverse()
tu.load_tools()
# Search by natural language description
tools = tu.run({
"name": "Tool_Finder",
"arguments": {
"description": "protein structure prediction",
"limit": 10
}
})
print(tools)
When to use: - Natural language queries - Semantic similarity search - When GPU is available
2. Tool_Finder_LLM (LLM-Based Search)
Alternative to embedding-based search that uses LLM reasoning. No GPU required.
tools = tu.run({
"name": "Tool_Finder_LLM",
"arguments": {
"description": "Find tools for analyzing gene expression data",
"limit": 10
}
})
When to use: - When GPU is not available - Complex queries requiring reasoning - Semantic understanding needed
3. Tool_Finder_Keyword (Keyword Search)
Fast keyword-based search through tool names and descriptions.
tools = tu.run({
"name": "Tool_Finder_Keyword",
"arguments": {
"description": "disease target associations",
"limit": 10
}
})
When to use: - Fast searches - Known keywords - Exact term matching
Listing Available Tools
List All Tools
all_tools = tu.list_tools()
print(f"Total tools available: {len(all_tools)}")
List Tools with Limit
tools = tu.list_tools(limit=20)
for tool in tools:
print(f"{tool['name']}: {tool['description']}")
Tool Information
Get Tool Details
# After finding a tool, inspect its details
tool_info = tu.get_tool_info("OpenTargets_get_associated_targets_by_disease_efoId")
print(tool_info)
Search Strategies
By Domain
Use domain-specific keywords: - Bioinformatics: "sequence alignment", "genomics", "RNA-seq" - Cheminformatics: "molecular dynamics", "drug design", "SMILES" - Machine Learning: "classification", "prediction", "neural network" - Structural Biology: "protein structure", "PDB", "crystallography"
By Functionality
Search by what you want to accomplish: - "Find disease-gene associations" - "Predict protein interactions" - "Analyze clinical trial data" - "Generate molecular descriptors"
By Data Source
Search for specific databases or APIs: - "OpenTargets", "PubChem", "UniProt" - "AlphaFold", "ChEMBL", "PDB" - "KEGG", "Reactome", "STRING"
Best Practices
- Start Broad: Begin with general terms, then refine
- Use Multiple Methods: Try different discovery methods if results aren't satisfactory
- Set Appropriate Limits: Use
limitparameter to control result size (default: 10) - Check Tool Descriptions: Review returned tool descriptions to verify relevance
- Iterate: Refine search terms based on initial results