Hypothesis Quality Criteria
Framework for Evaluating Scientific Hypotheses
Use these criteria to assess the quality and rigor of generated hypotheses. A robust hypothesis should score well across multiple dimensions.
Core Criteria
1. Testability
Definition: The hypothesis can be empirically tested through observation or experimentation.
Evaluation questions: - Can specific experiments or observations test this hypothesis? - Are the predicted outcomes measurable? - Can the hypothesis be tested with current or near-future methods? - Are there multiple independent ways to test it?
Strong testability examples: - "Increased expression of protein X will reduce cell proliferation rate by >30%" - "Patients receiving treatment Y will show 50% reduction in symptom Z within 4 weeks"
Weak testability examples: - "This process is influenced by complex interactions" (vague, no specific prediction) - "The mechanism involves quantum effects" (if no method to test quantum effects exists)
2. Falsifiability
Definition: Clear conditions or observations would disprove the hypothesis (Popperian criterion).
Evaluation questions: - What specific observations would prove this hypothesis wrong? - Are the falsifying conditions realistic to observe? - Is the hypothesis stated clearly enough to be disproven? - Can null results meaningfully falsify the hypothesis?
Strong falsifiability examples: - "If we knock out gene X, phenotype Y will disappear" (can be falsified if phenotype persists) - "Drug A will outperform placebo in 80% of patients" (clear falsification threshold)
Weak falsifiability examples: - "Multiple factors contribute to the outcome" (too vague to falsify) - "The effect may vary depending on context" (built-in escape clauses)
3. Parsimony (Occam's Razor)
Definition: Among competing hypotheses with equal explanatory power, prefer the simpler explanation.
Evaluation questions: - Does the hypothesis invoke the minimum number of entities/mechanisms needed? - Are all proposed elements necessary to explain the phenomenon? - Could a simpler mechanism account for the observations? - Does it avoid unnecessary assumptions?
Parsimony considerations: - Simple ≠ simplistic; complexity is justified when evidence demands it - Established mechanisms are "simpler" than novel, unproven ones - Direct mechanisms are simpler than elaborate multi-step pathways - One well-supported mechanism beats multiple speculative ones
4. Explanatory Power
Definition: The hypothesis accounts for a substantial portion of the observed phenomenon.
Evaluation questions: - How much of the observed data does this hypothesis explain? - Does it account for both typical and atypical observations? - Can it explain related phenomena beyond the immediate observation? - Does it resolve apparent contradictions in existing data?
Strong explanatory power indicators: - Explains multiple independent observations - Accounts for quantitative relationships, not just qualitative patterns - Resolves previously puzzling findings - Makes sense of seemingly contradictory results
Limited explanatory power indicators: - Only explains part of the phenomenon - Requires additional hypotheses for complete explanation - Leaves major observations unexplained
5. Scope
Definition: The range of phenomena and contexts the hypothesis can address.
Evaluation questions: - Does it apply only to the specific case or to broader situations? - Can it generalize across conditions, species, or systems? - Does it connect to larger theoretical frameworks? - What are its boundaries and limitations?
Broader scope (generally preferable): - Applies across multiple experimental conditions - Generalizes to related systems or species - Connects phenomenon to established principles
Narrower scope (acceptable if explicitly defined): - Limited to specific conditions or contexts - Requires different mechanisms in different settings - Context-dependent with clear boundaries
6. Consistency with Established Knowledge
Definition: Alignment with well-supported theories, principles, and empirical findings.
Evaluation questions: - Is it consistent with established physical, chemical, or biological principles? - Does it align with or reasonably extend current theories? - If contradicting established knowledge, is there strong justification? - Does it require violating well-supported laws or findings?
Levels of consistency: - Fully consistent: Applies established mechanisms in new context - Mostly consistent: Extends current understanding in plausible ways - Partially inconsistent: Contradicts some findings but has explanatory value - Highly inconsistent: Requires rejecting well-established principles (requires exceptional evidence)
7. Novelty and Insight
Definition: The hypothesis offers new understanding beyond merely restating known facts.
Evaluation questions: - Does it provide new mechanistic insight? - Does it challenge assumptions or conventional wisdom? - Does it suggest unexpected connections or relationships? - Does it open new research directions?
Novel contributions: - Proposes previously unconsidered mechanisms - Reframes the problem in a productive way - Connects disparate observations - Suggests non-obvious testable predictions
Note: Novelty alone doesn't make a hypothesis valuable; it must also be testable, parsimonious, and explanatory.
Comparative Evaluation
When evaluating multiple competing hypotheses:
Trade-offs and Balancing
Hypotheses often involve trade-offs: - More parsimonious but less explanatory power - Broader scope but less testable with current methods - Novel insights but less consistent with current knowledge
Evaluation approach: - No hypothesis needs to be perfect on all dimensions - Identify each hypothesis's strengths and weaknesses - Consider which criteria are most important for the specific phenomenon - Note which hypotheses are most immediately testable - Identify which would be most informative if supported
Distinguishability
Key question: Can experiments distinguish between competing hypotheses?
- Identify predictions that differ between hypotheses
- Prioritize hypotheses that make distinct predictions
- Note which experiments would most efficiently narrow the field
- Consider whether hypotheses could all be partially correct
Common Pitfalls
Untestable Hypotheses
- Too vague to generate specific predictions
- Invoke unobservable or unmeasurable entities
- Require technology that doesn't exist
Unfalsifiable Hypotheses
- Built-in escape clauses ("may or may not occur")
- Post-hoc explanations that fit any outcome
- No specification of what would disprove them
Overly Complex Hypotheses
- Invoke multiple unproven mechanisms
- Add unnecessary steps or entities
- Complexity not justified by explanatory gains
Just-So Stories
- Plausible narratives without testable predictions
- Explain observations but don't predict new ones
- Impossible to distinguish from alternative stories
Practical Application
When generating hypotheses:
- Draft initial hypotheses focusing on mechanistic explanations
- Apply quality criteria to identify weaknesses
- Refine hypotheses to improve testability and clarity
- Develop specific predictions to enhance testability and falsifiability
- Compare systematically across all criteria
- Prioritize for testing based on distinguishability and feasibility
Remember: The goal is not a perfect hypothesis, but a set of testable, falsifiable, informative hypotheses that advance understanding of the phenomenon.