references/experimental_design.md

Experimental Design Checklist

Research Question Formulation

Is the Question Well-Formed?

  • [ ] Specific: Clearly defined variables and relationships
  • [ ] Answerable: Can be addressed with available methods
  • [ ] Relevant: Addresses a gap in knowledge or practical need
  • [ ] Feasible: Resources, time, and ethical considerations allow it
  • [ ] Falsifiable: Can be proven wrong if incorrect

Have You Reviewed the Literature?

  • [ ] Identified what's already known
  • [ ] Found gaps or contradictions to address
  • [ ] Learned from methodological successes and failures
  • [ ] Identified appropriate outcome measures
  • [ ] Determined typical effect sizes in the field

Hypothesis Development

Is Your Hypothesis Testable?

  • [ ] Makes specific, quantifiable predictions
  • [ ] Variables are operationally defined
  • [ ] Specifies direction/nature of expected relationships
  • [ ] Can be falsified by potential observations

Types of Hypotheses

  • [ ] Null hypothesis (H₀): No effect/relationship exists
  • [ ] Alternative hypothesis (H₁): Effect/relationship exists
  • [ ] Directional vs. non-directional: One-tailed vs. two-tailed tests

Study Design Selection

What Type of Study is Appropriate?

Experimental (Intervention) Studies: - [ ] Randomized Controlled Trial (RCT): Gold standard for causation - [ ] Quasi-experimental: Non-random assignment but manipulation - [ ] Within-subjects: Same participants in all conditions - [ ] Between-subjects: Different participants per condition - [ ] Factorial: Multiple independent variables - [ ] Crossover: Participants receive multiple interventions sequentially

Observational Studies: - [ ] Cohort: Follow groups over time - [ ] Case-control: Compare those with/without outcome - [ ] Cross-sectional: Snapshot at one time point - [ ] Ecological: Population-level data

Consider: - [ ] Can you randomly assign participants? - [ ] Can you manipulate the independent variable? - [ ] Is the outcome rare (favor case-control) or common? - [ ] Do you need to establish temporal sequence? - [ ] What's feasible given ethical, practical constraints?

Variables

Independent Variables (Manipulated/Predictor)

  • [ ] Clearly defined and operationalized
  • [ ] Appropriate levels/categories chosen
  • [ ] Manipulation is sufficient to test hypothesis
  • [ ] Manipulation check planned (if applicable)

Dependent Variables (Outcome/Response)

  • [ ] Directly measures the construct of interest
  • [ ] Validated and reliable measurement
  • [ ] Sensitive enough to detect expected effects
  • [ ] Appropriate for statistical analysis planned
  • [ ] Primary outcome clearly designated

Control Variables

  • [ ] Confounding variables identified:
  • Variables that affect both IV and DV
  • Alternative explanations for findings
  • [ ] Strategy for control:
  • Randomization
  • Matching
  • Stratification
  • Statistical adjustment
  • Restriction (inclusion/exclusion criteria)
  • Blinding

Extraneous Variables

  • [ ] Potential sources of noise identified
  • [ ] Standardized procedures to minimize
  • [ ] Environmental factors controlled
  • [ ] Time of day, setting, equipment standardized

Sampling

Population Definition

  • [ ] Target population: Who you want to generalize to
  • [ ] Accessible population: Who you can actually sample from
  • [ ] Sample: Who actually participates
  • [ ] Difference between these documented

Sampling Method

  • [ ] Probability sampling (preferred for generalizability):
  • Simple random sampling
  • Stratified sampling
  • Cluster sampling
  • Systematic sampling
  • [ ] Non-probability sampling (common but limits generalizability):
  • Convenience sampling
  • Purposive sampling
  • Snowball sampling
  • Quota sampling

Sample Size

  • [ ] A priori power analysis conducted
  • Expected effect size (from literature or pilot)
  • Desired power (typically .80 or .90)
  • Significance level (typically .05)
  • Statistical test to be used
  • [ ] Accounts for expected attrition/dropout
  • [ ] Sufficient for planned subgroup analyses
  • [ ] Practical constraints acknowledged

Inclusion/Exclusion Criteria

  • [ ] Clearly defined and justified
  • [ ] Not overly restrictive (limits generalizability)
  • [ ] Based on theoretical or practical considerations
  • [ ] Ethical considerations addressed
  • [ ] Documented and applied consistently

Blinding and Randomization

Randomization

  • [ ] What is randomized:
  • Participant assignment to conditions
  • Order of conditions (within-subjects)
  • Stimuli/items presented
  • [ ] Method of randomization:
  • Computer-generated random numbers
  • Random number tables
  • Coin flips (for very small studies)
  • [ ] Allocation concealment:
  • Sequence generated before recruitment
  • Allocation hidden until after enrollment
  • Sequentially numbered, sealed envelopes (if needed)
  • [ ] Stratified randomization:
  • Balance important variables across groups
  • Block randomization to ensure equal group sizes
  • [ ] Check randomization:
  • Compare groups at baseline
  • Report any significant differences

Blinding

  • [ ] Single-blind: Participants don't know group assignment
  • [ ] Double-blind: Participants and researchers don't know
  • [ ] Triple-blind: Participants, researchers, and data analysts don't know
  • [ ] Blinding feasibility:
  • Is true blinding possible?
  • Placebo/sham controls needed?
  • Identical appearance of interventions?
  • [ ] Blinding check:
  • Assess whether blinding maintained
  • Ask participants/researchers to guess assignments

Control Groups and Conditions

What Type of Control?

  • [ ] No treatment control: Natural course of condition
  • [ ] Placebo control: Inert treatment for comparison
  • [ ] Active control: Standard treatment comparison
  • [ ] Wait-list control: Delayed treatment
  • [ ] Attention control: Matches contact time without active ingredient

Multiple Conditions

  • [ ] Factorial designs for multiple factors
  • [ ] Dose-response relationship assessment
  • [ ] Mechanism testing with component analyses

Procedures

Protocol Development

  • [ ] Detailed, written protocol:
  • Step-by-step procedures
  • Scripts for standardized instructions
  • Decision rules for handling issues
  • Data collection forms
  • [ ] Pilot tested before main study
  • [ ] Staff trained to criterion
  • [ ] Compliance monitoring planned

Standardization

  • [ ] Same instructions for all participants
  • [ ] Same equipment and materials
  • [ ] Same environment/setting when possible
  • [ ] Same assessment timing
  • [ ] Deviations from protocol documented

Data Collection

  • [ ] When collected:
  • Baseline measurements
  • Post-intervention
  • Follow-up timepoints
  • [ ] Who collects:
  • Trained researchers
  • Blinded when possible
  • Inter-rater reliability established
  • [ ] How collected:
  • Valid, reliable instruments
  • Standardized administration
  • Multiple methods if possible (triangulation)

Measurement

Validity

  • [ ] Face validity: Appears to measure construct
  • [ ] Content validity: Covers all aspects of construct
  • [ ] Criterion validity: Correlates with gold standard
  • Concurrent validity
  • Predictive validity
  • [ ] Construct validity: Measures theoretical construct
  • Convergent validity (correlates with related measures)
  • Discriminant validity (doesn't correlate with unrelated measures)

Reliability

  • [ ] Test-retest: Consistent over time
  • [ ] Internal consistency: Items measure same construct (Cronbach's α)
  • [ ] Inter-rater reliability: Agreement between raters (Cohen's κ, ICC)
  • [ ] Parallel forms: Alternative versions consistent

Measurement Considerations

  • [ ] Objective measures preferred when possible
  • [ ] Validated instruments used when available
  • [ ] Multiple measures of key constructs
  • [ ] Sensitivity to change considered
  • [ ] Floor/ceiling effects avoided
  • [ ] Response formats appropriate
  • [ ] Recall periods appropriate
  • [ ] Cultural appropriateness considered

Bias Minimization

Selection Bias

  • [ ] Random sampling when possible
  • [ ] Clearly defined eligibility criteria
  • [ ] Document who declines and why
  • [ ] Minimize self-selection

Performance Bias

  • [ ] Standardized protocols
  • [ ] Blinding of providers
  • [ ] Monitor protocol adherence
  • [ ] Document deviations

Detection Bias

  • [ ] Blinding of outcome assessors
  • [ ] Objective measures when possible
  • [ ] Standardized assessment procedures
  • [ ] Multiple raters with reliability checks

Attrition Bias

  • [ ] Strategies to minimize dropout
  • [ ] Track reasons for dropout
  • [ ] Compare dropouts to completers
  • [ ] Intention-to-treat analysis planned

Reporting Bias

  • [ ] Preregister study and analysis plan
  • [ ] Designate primary vs. secondary outcomes
  • [ ] Commit to reporting all outcomes
  • [ ] Distinguish planned from exploratory analyses

Data Management

Data Collection

  • [ ] Data collection forms designed and tested
  • [ ] REDCap, Qualtrics, or similar platforms
  • [ ] Range checks and validation rules
  • [ ] Regular backups
  • [ ] Secure storage (HIPAA/GDPR compliant if needed)

Data Quality

  • [ ] Real-time data validation
  • [ ] Regular quality checks
  • [ ] Missing data patterns monitored
  • [ ] Outliers identified and investigated
  • [ ] Protocol deviations documented

Data Security

  • [ ] De-identification procedures
  • [ ] Access controls
  • [ ] Audit trails
  • [ ] Compliance with regulations (IRB, HIPAA, GDPR)

Statistical Analysis Planning

Analysis Plan (Prespecify Before Data Collection)

  • [ ] Primary analysis:
  • Statistical test(s) specified
  • Hypothesis clearly stated
  • Significance level set (usually α = .05)
  • One-tailed or two-tailed
  • [ ] Secondary analyses:
  • Clearly designated as secondary
  • Exploratory analyses labeled as such
  • [ ] Multiple comparisons:
  • Adjustment method specified (if needed)
  • Primary outcome protects from inflation

Assumptions

  • [ ] Assumptions of statistical tests identified
  • [ ] Plan to check assumptions
  • [ ] Backup non-parametric alternatives
  • [ ] Transformation options considered

Missing Data

  • [ ] Anticipated amount of missingness
  • [ ] Missing data mechanism (MCAR, MAR, MNAR)
  • [ ] Handling strategy:
  • Complete case analysis
  • Multiple imputation
  • Maximum likelihood
  • [ ] Sensitivity analyses planned

Effect Sizes

  • [ ] Appropriate effect size measures identified
  • [ ] Will be reported alongside p-values
  • [ ] Confidence intervals planned

Statistical Software

  • [ ] Software selected (R, SPSS, Stata, Python, etc.)
  • [ ] Version documented
  • [ ] Analysis scripts prepared in advance
  • [ ] Will be made available (Open Science)

Ethical Considerations

Ethical Approval

  • [ ] IRB/Ethics committee approval obtained
  • [ ] Study registered (ClinicalTrials.gov, etc.) if applicable
  • [ ] Protocol follows Declaration of Helsinki or equivalent
  • [ ] Voluntary participation
  • [ ] Comprehensible explanation
  • [ ] Risks and benefits disclosed
  • [ ] Right to withdraw without penalty
  • [ ] Privacy protections explained
  • [ ] Compensation disclosed

Risk-Benefit Analysis

  • [ ] Potential benefits outweigh risks
  • [ ] Risks minimized
  • [ ] Vulnerable populations protected
  • [ ] Data safety monitoring (if high risk)

Confidentiality

  • [ ] Data de-identified
  • [ ] Secure storage
  • [ ] Limited access
  • [ ] Reporting doesn't allow re-identification

Validity Threats

Internal Validity (Causation)

  • [ ] History: External events between measurements
  • [ ] Maturation: Changes in participants over time
  • [ ] Testing: Effects of repeated measurement
  • [ ] Instrumentation: Changes in measurement over time
  • [ ] Regression to mean: Extreme scores becoming less extreme
  • [ ] Selection: Groups differ at baseline
  • [ ] Attrition: Differential dropout
  • [ ] Diffusion: Control group receives treatment elements

External Validity (Generalizability)

  • [ ] Sample representative of population
  • [ ] Setting realistic/natural
  • [ ] Treatment typical of real-world implementation
  • [ ] Outcome measures ecologically valid
  • [ ] Time frame appropriate

Construct Validity (Measurement)

  • [ ] Measures actually tap intended constructs
  • [ ] Operations match theoretical definitions
  • [ ] No confounding of constructs
  • [ ] Adequate coverage of construct

Statistical Conclusion Validity

  • [ ] Adequate statistical power
  • [ ] Assumptions met
  • [ ] Appropriate tests used
  • [ ] Alpha level appropriate
  • [ ] Multiple comparisons addressed

Reporting and Transparency

Preregistration

  • [ ] Study preregistered (OSF, ClinicalTrials.gov, AsPredicted)
  • [ ] Hypotheses stated a priori
  • [ ] Analysis plan documented
  • [ ] Distinguishes confirmatory from exploratory

Reporting Guidelines

  • [ ] RCTs: CONSORT checklist
  • [ ] Observational studies: STROBE checklist
  • [ ] Systematic reviews: PRISMA checklist
  • [ ] Diagnostic studies: STARD checklist
  • [ ] Qualitative research: COREQ checklist
  • [ ] Case reports: CARE guidelines

Transparency

  • [ ] All measures reported
  • [ ] All manipulations disclosed
  • [ ] Sample size determination explained
  • [ ] Exclusion criteria and numbers reported
  • [ ] Attrition documented
  • [ ] Deviations from protocol noted
  • [ ] Conflicts of interest disclosed

Open Science

  • [ ] Data sharing planned (when ethical)
  • [ ] Analysis code shared
  • [ ] Materials available
  • [ ] Preprint posted
  • [ ] Open access publication when possible

Post-Study Considerations

Data Analysis

  • [ ] Follow preregistered plan
  • [ ] Clearly label deviations and exploratory analyses
  • [ ] Check assumptions
  • [ ] Report all outcomes
  • [ ] Report effect sizes and CIs, not just p-values

Interpretation

  • [ ] Conclusions supported by data
  • [ ] Limitations acknowledged
  • [ ] Alternative explanations considered
  • [ ] Generalizability discussed
  • [ ] Clinical/practical significance addressed

Dissemination

  • [ ] Publish regardless of results (reduce publication bias)
  • [ ] Present at conferences
  • [ ] Share findings with participants (when appropriate)
  • [ ] Communicate to relevant stakeholders
  • [ ] Plain language summaries

Next Steps

  • [ ] Replication needed?
  • [ ] Follow-up studies identified
  • [ ] Mechanism studies planned
  • [ ] Clinical applications considered

Common Pitfalls to Avoid

  • [ ] No power analysis → underpowered study
  • [ ] Hypothesis formed after seeing data (HARKing)
  • [ ] No blinding when feasible → bias
  • [ ] P-hacking (data fishing, optional stopping)
  • [ ] Multiple testing without correction → false positives
  • [ ] Inadequate control group
  • [ ] Confounding not addressed
  • [ ] Instruments not validated
  • [ ] High attrition not addressed
  • [ ] Cherry-picking results to report
  • [ ] Causal language from correlational data
  • [ ] Ignoring assumptions of statistical tests
  • [ ] Not preregistering changes literature bias
  • [ ] Conflicts of interest not disclosed

Final Checklist Before Starting

  • [ ] Research question is clear and important
  • [ ] Hypothesis is testable and specific
  • [ ] Study design is appropriate
  • [ ] Sample size is adequate (power analysis)
  • [ ] Measures are valid and reliable
  • [ ] Confounds are controlled
  • [ ] Randomization and blinding implemented
  • [ ] Data collection is standardized
  • [ ] Analysis plan is prespecified
  • [ ] Ethical approval obtained
  • [ ] Study is preregistered
  • [ ] Resources are sufficient
  • [ ] Team is trained
  • [ ] Protocol is documented
  • [ ] Backup plans exist for problems

Remember

Good experimental design is about: - Asking clear questions - Minimizing bias - Maximizing validity - Appropriate inference - Transparency - Reproducibility

The best time to think about these issues is before collecting data, not after.

← Back to scientific-critical-thinking