references/other.md

FDA Other Databases - Substances and NSDE

This reference covers FDA substance-related and other specialized API endpoints accessible through openFDA.

Overview

The FDA maintains additional databases for substance-level information that is precise to the molecular level. These databases support regulatory activities across drugs, biologics, devices, foods, and cosmetics.

Available Endpoints

1. Substance Data

Endpoint: https://api.fda.gov/other/substance.json

Purpose: Access substance information that is precise to the molecular level for internal and external use. This includes information about active pharmaceutical ingredients, excipients, and other substances used in FDA-regulated products.

Data Source: FDA Global Substance Registration System (GSRS)

Key Fields: - uuid - Unique substance identifier (UUID) - approvalID - FDA Unique Ingredient Identifier (UNII) - approved - Approval date - substanceClass - Type of substance (chemical, protein, nucleic acid, polymer, etc.) - names - Array of substance names - names.name - Name text - names.type - Name type (systematic, brand, common, etc.) - names.preferred - Whether preferred name - codes - Array of substance codes - codes.code - Code value - codes.codeSystem - Code system (CAS, ECHA, EINECS, etc.) - codes.type - Code type - relationships - Array of substance relationships - relationships.type - Relationship type (ACTIVE MOIETY, METABOLITE, IMPURITY, etc.) - relationships.relatedSubstance - Related substance reference - moieties - Molecular moieties - properties - Array of physicochemical properties - properties.name - Property name - properties.value - Property value - properties.propertyType - Property type - structure - Chemical structure information - structure.smiles - SMILES notation - structure.inchi - InChI string - structure.inchiKey - InChI key - structure.formula - Molecular formula - structure.molecularWeight - Molecular weight - modifications - Structural modifications (for proteins, etc.) - protein - Protein-specific information - protein.subunits - Protein subunits - protein.sequenceType - Sequence type - nucleicAcid - Nucleic acid information - nucleicAcid.subunits - Sequence subunits - polymer - Polymer information - mixture - Mixture components - mixture.components - Component substances - tags - Substance tags - references - Literature references

Substance Classes: - Chemical - Small molecules with defined chemical structure - Protein - Proteins and peptides - Nucleic Acid - DNA, RNA, oligonucleotides - Polymer - Polymeric substances - Structurally Diverse - Complex mixtures, botanicals - Mixture - Defined mixtures - Concept - Abstract concepts (e.g., groups)

Common Use Cases: - Active ingredient identification - Molecular structure lookup - UNII code resolution - Chemical identifier mapping (CAS to UNII, etc.) - Substance relationship analysis - Excipient identification - Botanical substance information - Protein and biologic characterization

Example Queries:

import requests

api_key = "YOUR_API_KEY"
url = "https://api.fda.gov/other/substance.json"

# Look up substance by UNII code
params = {
    "api_key": api_key,
    "search": "approvalID:R16CO5Y76E",  # Aspirin UNII
    "limit": 1
}

response = requests.get(url, params=params)
data = response.json()
# Search by substance name
params = {
    "api_key": api_key,
    "search": "names.name:acetaminophen",
    "limit": 5
}
# Find substances by CAS number
params = {
    "api_key": api_key,
    "search": "codes.code:50-78-2",  # Aspirin CAS
    "limit": 1
}
# Get chemical substances only
params = {
    "api_key": api_key,
    "search": "substanceClass:chemical",
    "limit": 100
}
# Search by molecular formula
params = {
    "api_key": api_key,
    "search": "structure.formula:C8H9NO2",  # Acetaminophen
    "limit": 10
}
# Find protein substances
params = {
    "api_key": api_key,
    "search": "substanceClass:protein",
    "limit": 50
}

2. NSDE (National Substance Database Entry)

Endpoint: https://api.fda.gov/other/nsde.json

Purpose: Access historical substance data from legacy National Drug Code (NDC) directory entries. This endpoint provides substance information as it appears in historical drug product listings.

Note: This database is primarily for historical reference. For current substance information, use the Substance Data endpoint.

Key Fields: - proprietary_name - Product proprietary name - nonproprietary_name - Nonproprietary name - dosage_form - Dosage form - route - Route of administration - company_name - Company name - substance_name - Substance name - active_numerator_strength - Active ingredient strength (numerator) - active_ingred_unit - Active ingredient unit - pharm_classes - Pharmacological classes - dea_schedule - DEA controlled substance schedule

Common Use Cases: - Historical drug formulation research - Legacy system integration - Historical substance name mapping - Pharmaceutical history research

Example Queries:

# Search by substance name
params = {
    "api_key": api_key,
    "search": "substance_name:ibuprofen",
    "limit": 20
}

response = requests.get("https://api.fda.gov/other/nsde.json", params=params)
# Find controlled substances by DEA schedule
params = {
    "api_key": api_key,
    "search": "dea_schedule:CII",
    "limit": 50
}

Integration Tips

UNII to CAS Mapping

def get_substance_identifiers(unii, api_key):
    """
    Get all identifiers for a substance given its UNII code.

    Args:
        unii: FDA Unique Ingredient Identifier
        api_key: FDA API key

    Returns:
        Dictionary with substance identifiers
    """
    import requests

    url = "https://api.fda.gov/other/substance.json"
    params = {
        "api_key": api_key,
        "search": f"approvalID:{unii}",
        "limit": 1
    }

    response = requests.get(url, params=params)
    data = response.json()

    if "results" not in data or len(data["results"]) == 0:
        return None

    substance = data["results"][0]

    identifiers = {
        "unii": substance.get("approvalID"),
        "uuid": substance.get("uuid"),
        "preferred_name": None,
        "cas_numbers": [],
        "other_codes": {}
    }

    # Extract names
    if "names" in substance:
        for name in substance["names"]:
            if name.get("preferred"):
                identifiers["preferred_name"] = name.get("name")
                break
        if not identifiers["preferred_name"] and len(substance["names"]) > 0:
            identifiers["preferred_name"] = substance["names"][0].get("name")

    # Extract codes
    if "codes" in substance:
        for code in substance["codes"]:
            code_system = code.get("codeSystem", "").upper()
            code_value = code.get("code")

            if "CAS" in code_system:
                identifiers["cas_numbers"].append(code_value)
            else:
                if code_system not in identifiers["other_codes"]:
                    identifiers["other_codes"][code_system] = []
                identifiers["other_codes"][code_system].append(code_value)

    return identifiers

Chemical Structure Lookup

def get_chemical_structure(substance_name, api_key):
    """
    Get chemical structure information for a substance.

    Args:
        substance_name: Name of the substance
        api_key: FDA API key

    Returns:
        Dictionary with structure information
    """
    import requests

    url = "https://api.fda.gov/other/substance.json"
    params = {
        "api_key": api_key,
        "search": f"names.name:{substance_name}",
        "limit": 1
    }

    response = requests.get(url, params=params)
    data = response.json()

    if "results" not in data or len(data["results"]) == 0:
        return None

    substance = data["results"][0]

    if "structure" not in substance:
        return None

    structure = substance["structure"]

    return {
        "smiles": structure.get("smiles"),
        "inchi": structure.get("inchi"),
        "inchi_key": structure.get("inchiKey"),
        "formula": structure.get("formula"),
        "molecular_weight": structure.get("molecularWeight"),
        "substance_class": substance.get("substanceClass")
    }

Substance Relationship Mapping

def get_substance_relationships(unii, api_key):
    """
    Get all related substances (metabolites, active moieties, etc.).

    Args:
        unii: FDA Unique Ingredient Identifier
        api_key: FDA API key

    Returns:
        Dictionary organizing relationships by type
    """
    import requests

    url = "https://api.fda.gov/other/substance.json"
    params = {
        "api_key": api_key,
        "search": f"approvalID:{unii}",
        "limit": 1
    }

    response = requests.get(url, params=params)
    data = response.json()

    if "results" not in data or len(data["results"]) == 0:
        return None

    substance = data["results"][0]

    relationships = {}

    if "relationships" in substance:
        for rel in substance["relationships"]:
            rel_type = rel.get("type")
            if rel_type not in relationships:
                relationships[rel_type] = []

            related = {
                "uuid": rel.get("relatedSubstance", {}).get("uuid"),
                "unii": rel.get("relatedSubstance", {}).get("approvalID"),
                "name": rel.get("relatedSubstance", {}).get("refPname")
            }
            relationships[rel_type].append(related)

    return relationships

Active Ingredient Extraction

def find_active_ingredients_by_product(product_name, api_key):
    """
    Find active ingredients in a drug product.

    Args:
        product_name: Drug product name
        api_key: FDA API key

    Returns:
        List of active ingredient UNIIs and names
    """
    import requests

    # First search drug label database
    label_url = "https://api.fda.gov/drug/label.json"
    label_params = {
        "api_key": api_key,
        "search": f"openfda.brand_name:{product_name}",
        "limit": 1
    }

    response = requests.get(label_url, params=label_params)
    data = response.json()

    if "results" not in data or len(data["results"]) == 0:
        return None

    label = data["results"][0]

    # Extract UNIIs from openfda section
    active_ingredients = []

    if "openfda" in label:
        openfda = label["openfda"]

        # Get UNIIs
        unii_list = openfda.get("unii", [])
        generic_names = openfda.get("generic_name", [])

        for i, unii in enumerate(unii_list):
            ingredient = {"unii": unii}
            if i < len(generic_names):
                ingredient["name"] = generic_names[i]

            # Get additional substance info
            substance_info = get_substance_identifiers(unii, api_key)
            if substance_info:
                ingredient.update(substance_info)

            active_ingredients.append(ingredient)

    return active_ingredients

Best Practices

  1. Use UNII as primary identifier - Most consistent across FDA databases
  2. Map between identifier systems - CAS, UNII, InChI Key for cross-referencing
  3. Handle substance variations - Different salt forms, hydrates have different UNIIs
  4. Check substance class - Different classes have different data structures
  5. Validate chemical structures - SMILES and InChI should be verified
  6. Consider substance relationships - Active moiety vs. salt form matters
  7. Use preferred names - More consistent than trade names
  8. Cache substance data - Substance information changes infrequently
  9. Cross-reference with other endpoints - Link substances to drugs/products
  10. Handle mixture components - Complex products have multiple components

UNII System

The FDA Unique Ingredient Identifier (UNII) system provides: - Unique identifiers - Each substance gets one UNII - Substance specificity - Different forms (salts, hydrates) get different UNIIs - Global recognition - Used internationally - Stability - UNIIs don't change once assigned - Free access - No licensing required

UNII Format: 10-character alphanumeric code (e.g., R16CO5Y76E)

Substance Classes Explained

Chemical

  • Traditional small molecule drugs
  • Have defined molecular structure
  • Include organic and inorganic compounds
  • SMILES, InChI, molecular formula available

Protein

  • Polypeptides and proteins
  • Sequence information available
  • May have post-translational modifications
  • Includes antibodies, enzymes, hormones

Nucleic Acid

  • DNA and RNA sequences
  • Oligonucleotides
  • Antisense, siRNA, mRNA
  • Sequence data available

Polymer

  • Synthetic and natural polymers
  • Structural repeat units
  • Molecular weight distributions
  • Used as excipients and active ingredients

Structurally Diverse

  • Complex natural products
  • Botanical extracts
  • Materials without single molecular structure
  • Characterized by source and composition

Mixture

  • Defined combinations of substances
  • Fixed or variable composition
  • Each component trackable

Additional Resources

  • FDA Substance Registration System: https://fdasis.nlm.nih.gov/srs/
  • UNII Search: https://precision.fda.gov/uniisearch
  • OpenFDA Other APIs: https://open.fda.gov/apis/other/
  • API Basics: See api_basics.md in this references directory
  • Python examples: See scripts/fda_substance_query.py
← Back to fda-database