FDA Other Databases - Substances and NSDE
This reference covers FDA substance-related and other specialized API endpoints accessible through openFDA.
Overview
The FDA maintains additional databases for substance-level information that is precise to the molecular level. These databases support regulatory activities across drugs, biologics, devices, foods, and cosmetics.
Available Endpoints
1. Substance Data
Endpoint: https://api.fda.gov/other/substance.json
Purpose: Access substance information that is precise to the molecular level for internal and external use. This includes information about active pharmaceutical ingredients, excipients, and other substances used in FDA-regulated products.
Data Source: FDA Global Substance Registration System (GSRS)
Key Fields:
- uuid - Unique substance identifier (UUID)
- approvalID - FDA Unique Ingredient Identifier (UNII)
- approved - Approval date
- substanceClass - Type of substance (chemical, protein, nucleic acid, polymer, etc.)
- names - Array of substance names
- names.name - Name text
- names.type - Name type (systematic, brand, common, etc.)
- names.preferred - Whether preferred name
- codes - Array of substance codes
- codes.code - Code value
- codes.codeSystem - Code system (CAS, ECHA, EINECS, etc.)
- codes.type - Code type
- relationships - Array of substance relationships
- relationships.type - Relationship type (ACTIVE MOIETY, METABOLITE, IMPURITY, etc.)
- relationships.relatedSubstance - Related substance reference
- moieties - Molecular moieties
- properties - Array of physicochemical properties
- properties.name - Property name
- properties.value - Property value
- properties.propertyType - Property type
- structure - Chemical structure information
- structure.smiles - SMILES notation
- structure.inchi - InChI string
- structure.inchiKey - InChI key
- structure.formula - Molecular formula
- structure.molecularWeight - Molecular weight
- modifications - Structural modifications (for proteins, etc.)
- protein - Protein-specific information
- protein.subunits - Protein subunits
- protein.sequenceType - Sequence type
- nucleicAcid - Nucleic acid information
- nucleicAcid.subunits - Sequence subunits
- polymer - Polymer information
- mixture - Mixture components
- mixture.components - Component substances
- tags - Substance tags
- references - Literature references
Substance Classes: - Chemical - Small molecules with defined chemical structure - Protein - Proteins and peptides - Nucleic Acid - DNA, RNA, oligonucleotides - Polymer - Polymeric substances - Structurally Diverse - Complex mixtures, botanicals - Mixture - Defined mixtures - Concept - Abstract concepts (e.g., groups)
Common Use Cases: - Active ingredient identification - Molecular structure lookup - UNII code resolution - Chemical identifier mapping (CAS to UNII, etc.) - Substance relationship analysis - Excipient identification - Botanical substance information - Protein and biologic characterization
Example Queries:
import requests
api_key = "YOUR_API_KEY"
url = "https://api.fda.gov/other/substance.json"
# Look up substance by UNII code
params = {
"api_key": api_key,
"search": "approvalID:R16CO5Y76E", # Aspirin UNII
"limit": 1
}
response = requests.get(url, params=params)
data = response.json()
# Search by substance name
params = {
"api_key": api_key,
"search": "names.name:acetaminophen",
"limit": 5
}
# Find substances by CAS number
params = {
"api_key": api_key,
"search": "codes.code:50-78-2", # Aspirin CAS
"limit": 1
}
# Get chemical substances only
params = {
"api_key": api_key,
"search": "substanceClass:chemical",
"limit": 100
}
# Search by molecular formula
params = {
"api_key": api_key,
"search": "structure.formula:C8H9NO2", # Acetaminophen
"limit": 10
}
# Find protein substances
params = {
"api_key": api_key,
"search": "substanceClass:protein",
"limit": 50
}
2. NSDE (National Substance Database Entry)
Endpoint: https://api.fda.gov/other/nsde.json
Purpose: Access historical substance data from legacy National Drug Code (NDC) directory entries. This endpoint provides substance information as it appears in historical drug product listings.
Note: This database is primarily for historical reference. For current substance information, use the Substance Data endpoint.
Key Fields:
- proprietary_name - Product proprietary name
- nonproprietary_name - Nonproprietary name
- dosage_form - Dosage form
- route - Route of administration
- company_name - Company name
- substance_name - Substance name
- active_numerator_strength - Active ingredient strength (numerator)
- active_ingred_unit - Active ingredient unit
- pharm_classes - Pharmacological classes
- dea_schedule - DEA controlled substance schedule
Common Use Cases: - Historical drug formulation research - Legacy system integration - Historical substance name mapping - Pharmaceutical history research
Example Queries:
# Search by substance name
params = {
"api_key": api_key,
"search": "substance_name:ibuprofen",
"limit": 20
}
response = requests.get("https://api.fda.gov/other/nsde.json", params=params)
# Find controlled substances by DEA schedule
params = {
"api_key": api_key,
"search": "dea_schedule:CII",
"limit": 50
}
Integration Tips
UNII to CAS Mapping
def get_substance_identifiers(unii, api_key):
"""
Get all identifiers for a substance given its UNII code.
Args:
unii: FDA Unique Ingredient Identifier
api_key: FDA API key
Returns:
Dictionary with substance identifiers
"""
import requests
url = "https://api.fda.gov/other/substance.json"
params = {
"api_key": api_key,
"search": f"approvalID:{unii}",
"limit": 1
}
response = requests.get(url, params=params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
return None
substance = data["results"][0]
identifiers = {
"unii": substance.get("approvalID"),
"uuid": substance.get("uuid"),
"preferred_name": None,
"cas_numbers": [],
"other_codes": {}
}
# Extract names
if "names" in substance:
for name in substance["names"]:
if name.get("preferred"):
identifiers["preferred_name"] = name.get("name")
break
if not identifiers["preferred_name"] and len(substance["names"]) > 0:
identifiers["preferred_name"] = substance["names"][0].get("name")
# Extract codes
if "codes" in substance:
for code in substance["codes"]:
code_system = code.get("codeSystem", "").upper()
code_value = code.get("code")
if "CAS" in code_system:
identifiers["cas_numbers"].append(code_value)
else:
if code_system not in identifiers["other_codes"]:
identifiers["other_codes"][code_system] = []
identifiers["other_codes"][code_system].append(code_value)
return identifiers
Chemical Structure Lookup
def get_chemical_structure(substance_name, api_key):
"""
Get chemical structure information for a substance.
Args:
substance_name: Name of the substance
api_key: FDA API key
Returns:
Dictionary with structure information
"""
import requests
url = "https://api.fda.gov/other/substance.json"
params = {
"api_key": api_key,
"search": f"names.name:{substance_name}",
"limit": 1
}
response = requests.get(url, params=params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
return None
substance = data["results"][0]
if "structure" not in substance:
return None
structure = substance["structure"]
return {
"smiles": structure.get("smiles"),
"inchi": structure.get("inchi"),
"inchi_key": structure.get("inchiKey"),
"formula": structure.get("formula"),
"molecular_weight": structure.get("molecularWeight"),
"substance_class": substance.get("substanceClass")
}
Substance Relationship Mapping
def get_substance_relationships(unii, api_key):
"""
Get all related substances (metabolites, active moieties, etc.).
Args:
unii: FDA Unique Ingredient Identifier
api_key: FDA API key
Returns:
Dictionary organizing relationships by type
"""
import requests
url = "https://api.fda.gov/other/substance.json"
params = {
"api_key": api_key,
"search": f"approvalID:{unii}",
"limit": 1
}
response = requests.get(url, params=params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
return None
substance = data["results"][0]
relationships = {}
if "relationships" in substance:
for rel in substance["relationships"]:
rel_type = rel.get("type")
if rel_type not in relationships:
relationships[rel_type] = []
related = {
"uuid": rel.get("relatedSubstance", {}).get("uuid"),
"unii": rel.get("relatedSubstance", {}).get("approvalID"),
"name": rel.get("relatedSubstance", {}).get("refPname")
}
relationships[rel_type].append(related)
return relationships
Active Ingredient Extraction
def find_active_ingredients_by_product(product_name, api_key):
"""
Find active ingredients in a drug product.
Args:
product_name: Drug product name
api_key: FDA API key
Returns:
List of active ingredient UNIIs and names
"""
import requests
# First search drug label database
label_url = "https://api.fda.gov/drug/label.json"
label_params = {
"api_key": api_key,
"search": f"openfda.brand_name:{product_name}",
"limit": 1
}
response = requests.get(label_url, params=label_params)
data = response.json()
if "results" not in data or len(data["results"]) == 0:
return None
label = data["results"][0]
# Extract UNIIs from openfda section
active_ingredients = []
if "openfda" in label:
openfda = label["openfda"]
# Get UNIIs
unii_list = openfda.get("unii", [])
generic_names = openfda.get("generic_name", [])
for i, unii in enumerate(unii_list):
ingredient = {"unii": unii}
if i < len(generic_names):
ingredient["name"] = generic_names[i]
# Get additional substance info
substance_info = get_substance_identifiers(unii, api_key)
if substance_info:
ingredient.update(substance_info)
active_ingredients.append(ingredient)
return active_ingredients
Best Practices
- Use UNII as primary identifier - Most consistent across FDA databases
- Map between identifier systems - CAS, UNII, InChI Key for cross-referencing
- Handle substance variations - Different salt forms, hydrates have different UNIIs
- Check substance class - Different classes have different data structures
- Validate chemical structures - SMILES and InChI should be verified
- Consider substance relationships - Active moiety vs. salt form matters
- Use preferred names - More consistent than trade names
- Cache substance data - Substance information changes infrequently
- Cross-reference with other endpoints - Link substances to drugs/products
- Handle mixture components - Complex products have multiple components
UNII System
The FDA Unique Ingredient Identifier (UNII) system provides: - Unique identifiers - Each substance gets one UNII - Substance specificity - Different forms (salts, hydrates) get different UNIIs - Global recognition - Used internationally - Stability - UNIIs don't change once assigned - Free access - No licensing required
UNII Format: 10-character alphanumeric code (e.g., R16CO5Y76E)
Substance Classes Explained
Chemical
- Traditional small molecule drugs
- Have defined molecular structure
- Include organic and inorganic compounds
- SMILES, InChI, molecular formula available
Protein
- Polypeptides and proteins
- Sequence information available
- May have post-translational modifications
- Includes antibodies, enzymes, hormones
Nucleic Acid
- DNA and RNA sequences
- Oligonucleotides
- Antisense, siRNA, mRNA
- Sequence data available
Polymer
- Synthetic and natural polymers
- Structural repeat units
- Molecular weight distributions
- Used as excipients and active ingredients
Structurally Diverse
- Complex natural products
- Botanical extracts
- Materials without single molecular structure
- Characterized by source and composition
Mixture
- Defined combinations of substances
- Fixed or variable composition
- Each component trackable
Additional Resources
- FDA Substance Registration System: https://fdasis.nlm.nih.gov/srs/
- UNII Search: https://precision.fda.gov/uniisearch
- OpenFDA Other APIs: https://open.fda.gov/apis/other/
- API Basics: See
api_basics.mdin this references directory - Python examples: See
scripts/fda_substance_query.py