references/rules_catalog.md

Medchem Rules and Filters Catalog

Comprehensive catalog of all available medicinal chemistry rules, structural alerts, and filters in medchem.

Table of Contents

  1. Drug-Likeness Rules
  2. Lead-Likeness Rules
  3. Fragment Rules
  4. CNS Rules
  5. Structural Alert Filters
  6. Chemical Group Patterns

Drug-Likeness Rules

Rule of Five (Lipinski)

Reference: Lipinski et al., Adv Drug Deliv Rev (1997) 23:3-25

Purpose: Predict oral bioavailability

Criteria: - Molecular Weight ≤ 500 Da - LogP ≤ 5 - Hydrogen Bond Donors ≤ 5 - Hydrogen Bond Acceptors ≤ 10

Usage:

mc.rules.basic_rules.rule_of_five(mol)

Notes: - One of the most widely used filters in drug discovery - About 90% of orally active drugs comply with these rules - Exceptions exist, especially for natural products and antibiotics


Rule of Veber

Reference: Veber et al., J Med Chem (2002) 45:2615-2623

Purpose: Additional criteria for oral bioavailability

Criteria: - Rotatable Bonds ≤ 10 - Topological Polar Surface Area (TPSA) ≤ 140 Ų

Usage:

mc.rules.basic_rules.rule_of_veber(mol)

Notes: - Complements Rule of Five - TPSA correlates with cell permeability - Rotatable bonds affect molecular flexibility


Rule of Drug

Purpose: Combined drug-likeness assessment

Criteria: - Passes Rule of Five - Passes Veber rules - Does not contain PAINS substructures

Usage:

mc.rules.basic_rules.rule_of_drug(mol)

REOS (Rapid Elimination Of Swill)

Reference: Walters & Murcko, Adv Drug Deliv Rev (2002) 54:255-271

Purpose: Filter out compounds unlikely to be drugs

Criteria: - Molecular Weight: 200-500 Da - LogP: -5 to 5 - Hydrogen Bond Donors: 0-5 - Hydrogen Bond Acceptors: 0-10

Usage:

mc.rules.basic_rules.rule_of_reos(mol)

Golden Triangle

Reference: Johnson et al., J Med Chem (2009) 52:5487-5500

Purpose: Balance lipophilicity and molecular weight

Criteria: - 200 ≤ MW ≤ 50 × LogP + 400 - LogP: -2 to 5

Usage:

mc.rules.basic_rules.golden_triangle(mol)

Notes: - Defines optimal physicochemical space - Visual representation resembles a triangle on MW vs LogP plot


Lead-Likeness Rules

Rule of Oprea

Reference: Oprea et al., J Chem Inf Comput Sci (2001) 41:1308-1315

Purpose: Identify lead-like compounds for optimization

Criteria: - Molecular Weight: 200-350 Da - LogP: -2 to 4 - Rotatable Bonds ≤ 7 - Number of Rings ≤ 4

Usage:

mc.rules.basic_rules.rule_of_oprea(mol)

Rationale: Lead compounds should have "room to grow" during optimization


Rule of Leadlike (Soft)

Purpose: Permissive lead-like criteria

Criteria: - Molecular Weight: 250-450 Da - LogP: -3 to 4 - Rotatable Bonds ≤ 10

Usage:

mc.rules.basic_rules.rule_of_leadlike_soft(mol)

Rule of Leadlike (Strict)

Purpose: Restrictive lead-like criteria

Criteria: - Molecular Weight: 200-350 Da - LogP: -2 to 3.5 - Rotatable Bonds ≤ 7 - Number of Rings: 1-3

Usage:

mc.rules.basic_rules.rule_of_leadlike_strict(mol)

Fragment Rules

Rule of Three

Reference: Congreve et al., Drug Discov Today (2003) 8:876-877

Purpose: Screen fragment libraries for fragment-based drug discovery

Criteria: - Molecular Weight ≤ 300 Da - LogP ≤ 3 - Hydrogen Bond Donors ≤ 3 - Hydrogen Bond Acceptors ≤ 3 - Rotatable Bonds ≤ 3 - Polar Surface Area ≤ 60 Ų

Usage:

mc.rules.basic_rules.rule_of_three(mol)

Notes: - Fragments are grown into leads during optimization - Lower complexity allows more starting points


CNS Rules

Rule of CNS

Purpose: Central nervous system drug-likeness

Criteria: - Molecular Weight ≤ 450 Da - LogP: -1 to 5 - Hydrogen Bond Donors ≤ 2 - TPSA ≤ 90 Ų

Usage:

mc.rules.basic_rules.rule_of_cns(mol)

Rationale: - Blood-brain barrier penetration requires specific properties - Lower TPSA and HBD count improve BBB permeability - Tight constraints reflect CNS challenges


Structural Alert Filters

PAINS (Pan Assay INterference compoundS)

Reference: Baell & Holloway, J Med Chem (2010) 53:2719-2740

Purpose: Identify compounds that interfere with assays

Categories: - Catechols - Quinones - Rhodanines - Hydroxyphenylhydrazones - Alkyl/aryl aldehydes - Michael acceptors (specific patterns)

Usage:

mc.rules.basic_rules.pains_filter(mol)
# Returns True if NO PAINS found

Notes: - PAINS compounds show activity in multiple assays through non-specific mechanisms - Common false positives in screening campaigns - Should be deprioritized in lead selection


Common Alerts Filters

Source: Derived from ChEMBL curation and medicinal chemistry literature

Purpose: Flag common problematic structural patterns

Alert Categories: 1. Reactive Groups - Epoxides - Aziridines - Acid halides - Isocyanates

  1. Metabolic Liabilities
  2. Hydrazines
  3. Thioureas
  4. Anilines (certain patterns)

  5. Aggregators

  6. Polyaromatic systems
  7. Long aliphatic chains

  8. Toxicophores

  9. Nitro aromatics
  10. Aromatic N-oxides
  11. Certain heterocycles

Usage:

alert_filter = mc.structural.CommonAlertsFilters()
has_alerts, details = alert_filter.check_mol(mol)

Return Format:

{
    "has_alerts": True,
    "alert_details": ["reactive_epoxide", "metabolic_hydrazine"],
    "num_alerts": 2
}

NIBR Filters

Source: Novartis Institutes for BioMedical Research

Purpose: Industrial medicinal chemistry filtering rules

Features: - Proprietary filter set developed from Novartis experience - Balances drug-likeness with practical medicinal chemistry - Includes both structural alerts and property filters

Usage:

nibr_filter = mc.structural.NIBRFilters()
results = nibr_filter(mols=mol_list, n_jobs=-1)

Return Format: Boolean list (True = passes)


Lilly Demerits Filter

Reference: Based on Eli Lilly medicinal chemistry rules

Source: 275 structural patterns accumulated over 18 years

Purpose: Identify assay interference and problematic functionalities

Mechanism: - Each matched pattern adds demerits - Molecules with >100 demerits are rejected - Some patterns add 10-50 demerits, others add 100+ (instant rejection)

Demerit Categories:

  1. High Demerits (>50):
  2. Known toxic groups
  3. Highly reactive functionalities
  4. Strong metal chelators

  5. Medium Demerits (20-50):

  6. Metabolic liabilities
  7. Aggregation-prone structures
  8. Frequent hitters

  9. Low Demerits (5-20):

  10. Minor concerns
  11. Context-dependent issues

Usage:

lilly_filter = mc.structural.LillyDemeritsFilters()
results = lilly_filter(mols=mol_list, n_jobs=-1)

Return Format:

{
    "demerits": 35,
    "passes": True,  # (demerits ≤ 100)
    "matched_patterns": [
        {"pattern": "phenolic_ester", "demerits": 20},
        {"pattern": "aniline_derivative", "demerits": 15}
    ]
}

Chemical Group Patterns

Hinge Binders

Purpose: Identify kinase hinge-binding motifs

Common Patterns: - Aminopyridines - Aminopyrimidines - Indazoles - Benzimidazoles

Usage:

group = mc.groups.ChemicalGroup(groups=["hinge_binders"])
has_hinge = group.has_match(mol_list)

Application: Kinase inhibitor design


Phosphate Binders

Purpose: Identify phosphate-binding groups

Common Patterns: - Basic amines in specific geometries - Guanidinium groups - Arginine mimetics

Usage:

group = mc.groups.ChemicalGroup(groups=["phosphate_binders"])

Application: Kinase inhibitors, phosphatase inhibitors


Michael Acceptors

Purpose: Identify electrophilic Michael acceptor groups

Common Patterns: - α,β-Unsaturated carbonyls - α,β-Unsaturated nitriles - Vinyl sulfones - Acrylamides

Usage:

group = mc.groups.ChemicalGroup(groups=["michael_acceptors"])

Notes: - Can be desirable for covalent inhibitors - Often flagged as reactive alerts in screening


Reactive Groups

Purpose: Identify generally reactive functionalities

Common Patterns: - Epoxides - Aziridines - Acyl halides - Isocyanates - Sulfonyl chlorides

Usage:

group = mc.groups.ChemicalGroup(groups=["reactive_groups"])

Custom SMARTS Patterns

Define custom structural patterns using SMARTS:

custom_patterns = {
    "my_warhead": "[C;H0](=O)C(F)(F)F",  # Trifluoromethyl ketone
    "my_scaffold": "c1ccc2c(c1)ncc(n2)N",  # Aminobenzimidazole
}

group = mc.groups.ChemicalGroup(
    groups=["hinge_binders"],
    custom_smarts=custom_patterns
)

Filter Selection Guidelines

Initial Screening (High-Throughput)

Recommended filters: - Rule of Five - PAINS filter - Common Alerts (permissive settings)

rfilter = mc.rules.RuleFilters(rule_list=["rule_of_five", "pains_filter"])
alert_filter = mc.structural.CommonAlertsFilters()

Hit-to-Lead

Recommended filters: - Rule of Oprea or Leadlike (soft) - NIBR filters - Lilly Demerits

rfilter = mc.rules.RuleFilters(rule_list=["rule_of_oprea"])
nibr_filter = mc.structural.NIBRFilters()
lilly_filter = mc.structural.LillyDemeritsFilters()

Lead Optimization

Recommended filters: - Rule of Drug - Leadlike (strict) - Full structural alert analysis - Complexity filters

rfilter = mc.rules.RuleFilters(rule_list=["rule_of_drug", "rule_of_leadlike_strict"])
alert_filter = mc.structural.CommonAlertsFilters()
complexity_filter = mc.complexity.ComplexityFilter(max_complexity=400)

CNS Targets

Recommended filters: - Rule of CNS - Reduced PAINS criteria (CNS-focused) - BBB permeability constraints

rfilter = mc.rules.RuleFilters(rule_list=["rule_of_cns"])
constraints = mc.constraints.Constraints(
    tpsa_max=90,
    hbd_max=2,
    mw_range=(300, 450)
)

Fragment-Based Drug Discovery

Recommended filters: - Rule of Three - Minimal complexity - Basic reactive group check

rfilter = mc.rules.RuleFilters(rule_list=["rule_of_three"])
complexity_filter = mc.complexity.ComplexityFilter(max_complexity=250)

Important Considerations

False Positives and False Negatives

Filters are guidelines, not absolutes:

  1. False Positives (good drugs flagged):
  2. ~10% of marketed drugs fail Rule of Five
  3. Natural products often violate standard rules
  4. Prodrugs intentionally break rules
  5. Antibiotics and antivirals frequently non-compliant

  6. False Negatives (bad compounds passing):

  7. Passing filters doesn't guarantee success
  8. Target-specific issues not captured
  9. In vivo properties not fully predicted

Context-Specific Application

Different contexts require different criteria:

  • Target Class: Kinases vs GPCRs vs ion channels have different optimal spaces
  • Modality: Small molecules vs PROTACs vs molecular glues
  • Administration Route: Oral vs IV vs topical
  • Disease Area: CNS vs oncology vs infectious disease
  • Stage: Screening vs hit-to-lead vs lead optimization

Complementing with Machine Learning

Modern approaches combine rules with ML:

# Rule-based pre-filtering
rule_results = mc.rules.RuleFilters(rule_list=["rule_of_five"])(mols)
filtered_mols = [mol for mol, r in zip(mols, rule_results) if r["passes"]]

# ML model scoring on filtered set
ml_scores = ml_model.predict(filtered_mols)

# Combined decision
final_candidates = [
    mol for mol, score in zip(filtered_mols, ml_scores)
    if score > threshold
]

References

  1. Lipinski CA et al. Adv Drug Deliv Rev (1997) 23:3-25
  2. Veber DF et al. J Med Chem (2002) 45:2615-2623
  3. Oprea TI et al. J Chem Inf Comput Sci (2001) 41:1308-1315
  4. Congreve M et al. Drug Discov Today (2003) 8:876-877
  5. Baell JB & Holloway GA. J Med Chem (2010) 53:2719-2740
  6. Johnson TW et al. J Med Chem (2009) 52:5487-5500
  7. Walters WP & Murcko MA. Adv Drug Deliv Rev (2002) 54:255-271
  8. Hann MM & Oprea TI. Curr Opin Chem Biol (2004) 8:255-263
  9. Rishton GM. Drug Discov Today (1997) 2:382-384
← Back to medchem