Scientific Critical Thinking


name: scientific-critical-thinking description: "Evaluate research rigor. Assess methodology, experimental design, statistical validity, biases, confounding, evidence quality (GRADE, Cochrane ROB), for critical analysis of scientific claims."


Scientific Critical Thinking

Overview

Critical thinking is a systematic process for evaluating scientific rigor. Assess methodology, experimental design, statistical validity, biases, confounding, and evidence quality using GRADE and Cochrane ROB frameworks. Apply this skill for critical analysis of scientific claims.

When to Use This Skill

This skill should be used when: - Evaluating research methodology and experimental design - Assessing statistical validity and evidence quality - Identifying biases and confounding in studies - Reviewing scientific claims and conclusions - Conducting systematic reviews or meta-analyses - Applying GRADE or Cochrane risk of bias assessments - Providing critical analysis of research papers

Core Capabilities

1. Methodology Critique

Evaluate research methodology for rigor, validity, and potential flaws.

Apply when: - Reviewing research papers - Assessing experimental designs - Evaluating study protocols - Planning new research

Evaluation framework:

  1. Study Design Assessment
  2. Is the design appropriate for the research question?
  3. Can the design support causal claims being made?
  4. Are comparison groups appropriate and adequate?
  5. Consider whether experimental, quasi-experimental, or observational design is justified

  6. Validity Analysis

  7. Internal validity: Can we trust the causal inference?
    • Check randomization quality
    • Evaluate confounding control
    • Assess selection bias
    • Review attrition/dropout patterns
  8. External validity: Do results generalize?
    • Evaluate sample representativeness
    • Consider ecological validity of setting
    • Assess whether conditions match target application
  9. Construct validity: Do measures capture intended constructs?
    • Review measurement validation
    • Check operational definitions
    • Assess whether measures are direct or proxy
  10. Statistical conclusion validity: Are statistical inferences sound?

    • Verify adequate power/sample size
    • Check assumption compliance
    • Evaluate test appropriateness
  11. Control and Blinding

  12. Was randomization properly implemented (sequence generation, allocation concealment)?
  13. Was blinding feasible and implemented (participants, providers, assessors)?
  14. Are control conditions appropriate (placebo, active control, no treatment)?
  15. Could performance or detection bias affect results?

  16. Measurement Quality

  17. Are instruments validated and reliable?
  18. Are measures objective when possible, or subjective with acknowledged limitations?
  19. Is outcome assessment standardized?
  20. Are multiple measures used to triangulate findings?

Reference: See references/scientific_method.md for detailed principles and references/experimental_design.md for comprehensive design checklist.

2. Bias Detection

Identify and evaluate potential sources of bias that could distort findings.

Apply when: - Reviewing published research - Designing new studies - Interpreting conflicting evidence - Assessing research quality

Systematic bias review:

  1. Cognitive Biases (Researcher)
  2. Confirmation bias: Are only supporting findings highlighted?
  3. HARKing: Were hypotheses stated a priori or formed after seeing results?
  4. Publication bias: Are negative results missing from literature?
  5. Cherry-picking: Is evidence selectively reported?
  6. Check for preregistration and analysis plan transparency

  7. Selection Biases

  8. Sampling bias: Is sample representative of target population?
  9. Volunteer bias: Do participants self-select in systematic ways?
  10. Attrition bias: Is dropout differential between groups?
  11. Survivorship bias: Are only "survivors" visible in sample?
  12. Examine participant flow diagrams and compare baseline characteristics

  13. Measurement Biases

  14. Observer bias: Could expectations influence observations?
  15. Recall bias: Are retrospective reports systematically inaccurate?
  16. Social desirability: Are responses biased toward acceptability?
  17. Instrument bias: Do measurement tools systematically err?
  18. Evaluate blinding, validation, and measurement objectivity

  19. Analysis Biases

  20. P-hacking: Were multiple analyses conducted until significance emerged?
  21. Outcome switching: Were non-significant outcomes replaced with significant ones?
  22. Selective reporting: Are all planned analyses reported?
  23. Subgroup fishing: Were subgroup analyses conducted without correction?
  24. Check for study registration and compare to published outcomes

  25. Confounding

  26. What variables could affect both exposure and outcome?
  27. Were confounders measured and controlled (statistically or by design)?
  28. Could unmeasured confounding explain findings?
  29. Are there plausible alternative explanations?

Reference: See references/common_biases.md for comprehensive bias taxonomy with detection and mitigation strategies.

3. Statistical Analysis Evaluation

Critically assess statistical methods, interpretation, and reporting.

Apply when: - Reviewing quantitative research - Evaluating data-driven claims - Assessing clinical trial results - Reviewing meta-analyses

Statistical review checklist:

  1. Sample Size and Power
  2. Was a priori power analysis conducted?
  3. Is sample adequate for detecting meaningful effects?
  4. Is the study underpowered (common problem)?
  5. Do significant results from small samples raise flags for inflated effect sizes?

  6. Statistical Tests

  7. Are tests appropriate for data type and distribution?
  8. Were test assumptions checked and met?
  9. Are parametric tests justified, or should non-parametric alternatives be used?
  10. Is the analysis matched to study design (e.g., paired vs. independent)?

  11. Multiple Comparisons

  12. Were multiple hypotheses tested?
  13. Was correction applied (Bonferroni, FDR, other)?
  14. Are primary outcomes distinguished from secondary/exploratory?
  15. Could findings be false positives from multiple testing?

  16. P-Value Interpretation

  17. Are p-values interpreted correctly (probability of data if null is true)?
  18. Is non-significance incorrectly interpreted as "no effect"?
  19. Is statistical significance conflated with practical importance?
  20. Are exact p-values reported, or only "p < .05"?
  21. Is there suspicious clustering just below .05?

  22. Effect Sizes and Confidence Intervals

  23. Are effect sizes reported alongside significance?
  24. Are confidence intervals provided to show precision?
  25. Is the effect size meaningful in practical terms?
  26. Are standardized effect sizes interpreted with field-specific context?

  27. Missing Data

  28. How much data is missing?
  29. Is missing data mechanism considered (MCAR, MAR, MNAR)?
  30. How is missing data handled (deletion, imputation, maximum likelihood)?
  31. Could missing data bias results?

  32. Regression and Modeling

  33. Is the model overfitted (too many predictors, no cross-validation)?
  34. Are predictions made outside the data range (extrapolation)?
  35. Are multicollinearity issues addressed?
  36. Are model assumptions checked?

  37. Common Pitfalls

  38. Correlation treated as causation
  39. Ignoring regression to the mean
  40. Base rate neglect
  41. Texas sharpshooter fallacy (pattern finding in noise)
  42. Simpson's paradox (confounding by subgroups)

Reference: See references/statistical_pitfalls.md for detailed pitfalls and correct practices.

4. Evidence Quality Assessment

Evaluate the strength and quality of evidence systematically.

Apply when: - Weighing evidence for decisions - Conducting literature reviews - Comparing conflicting findings - Determining confidence in conclusions

Evidence evaluation framework:

  1. Study Design Hierarchy
  2. Systematic reviews/meta-analyses (highest for intervention effects)
  3. Randomized controlled trials
  4. Cohort studies
  5. Case-control studies
  6. Cross-sectional studies
  7. Case series/reports
  8. Expert opinion (lowest)

Important: Higher-level designs aren't always better quality. A well-designed observational study can be stronger than a poorly-conducted RCT.

  1. Quality Within Design Type
  2. Risk of bias assessment (use appropriate tool: Cochrane ROB, Newcastle-Ottawa, etc.)
  3. Methodological rigor
  4. Transparency and reporting completeness
  5. Conflicts of interest

  6. GRADE Considerations (if applicable)

  7. Start with design type (RCT = high, observational = low)
  8. Downgrade for:
    • Risk of bias
    • Inconsistency across studies
    • Indirectness (wrong population/intervention/outcome)
    • Imprecision (wide confidence intervals, small samples)
    • Publication bias
  9. Upgrade for:

    • Large effect sizes
    • Dose-response relationships
    • Confounders would reduce (not increase) effect
  10. Convergence of Evidence

  11. Stronger when:
    • Multiple independent replications
    • Different research groups and settings
    • Different methodologies converge on same conclusion
    • Mechanistic and empirical evidence align
  12. Weaker when:

    • Single study or research group
    • Contradictory findings in literature
    • Publication bias evident
    • No replication attempts
  13. Contextual Factors

  14. Biological/theoretical plausibility
  15. Consistency with established knowledge
  16. Temporality (cause precedes effect)
  17. Specificity of relationship
  18. Strength of association

Reference: See references/evidence_hierarchy.md for detailed hierarchy, GRADE system, and quality assessment tools.

5. Logical Fallacy Identification

Detect and name logical errors in scientific arguments and claims.

Apply when: - Evaluating scientific claims - Reviewing discussion/conclusion sections - Assessing popular science communication - Identifying flawed reasoning

Common fallacies in science:

  1. Causation Fallacies
  2. Post hoc ergo propter hoc: "B followed A, so A caused B"
  3. Correlation = causation: Confusing association with causality
  4. Reverse causation: Mistaking cause for effect
  5. Single cause fallacy: Attributing complex outcomes to one factor

  6. Generalization Fallacies

  7. Hasty generalization: Broad conclusions from small samples
  8. Anecdotal fallacy: Personal stories as proof
  9. Cherry-picking: Selecting only supporting evidence
  10. Ecological fallacy: Group patterns applied to individuals

  11. Authority and Source Fallacies

  12. Appeal to authority: "Expert said it, so it's true" (without evidence)
  13. Ad hominem: Attacking person, not argument
  14. Genetic fallacy: Judging by origin, not merits
  15. Appeal to nature: "Natural = good/safe"

  16. Statistical Fallacies

  17. Base rate neglect: Ignoring prior probability
  18. Texas sharpshooter: Finding patterns in random data
  19. Multiple comparisons: Not correcting for multiple tests
  20. Prosecutor's fallacy: Confusing P(E|H) with P(H|E)

  21. Structural Fallacies

  22. False dichotomy: "Either A or B" when more options exist
  23. Moving goalposts: Changing evidence standards after they're met
  24. Begging the question: Circular reasoning
  25. Straw man: Misrepresenting arguments to attack them

  26. Science-Specific Fallacies

  27. Galileo gambit: "They laughed at Galileo, so my fringe idea is correct"
  28. Argument from ignorance: "Not proven false, so true"
  29. Nirvana fallacy: Rejecting imperfect solutions
  30. Unfalsifiability: Making untestable claims

When identifying fallacies: - Name the specific fallacy - Explain why the reasoning is flawed - Identify what evidence would be needed for valid inference - Note that fallacious reasoning doesn't prove the conclusion false—just that this argument doesn't support it

Reference: See references/logical_fallacies.md for comprehensive fallacy catalog with examples and detection strategies.

6. Research Design Guidance

Provide constructive guidance for planning rigorous studies.

Apply when: - Helping design new experiments - Planning research projects - Reviewing research proposals - Improving study protocols

Design process:

  1. Research Question Refinement
  2. Ensure question is specific, answerable, and falsifiable
  3. Verify it addresses a gap or contradiction in literature
  4. Confirm feasibility (resources, ethics, time)
  5. Define variables operationally

  6. Design Selection

  7. Match design to question (causal → experimental; associational → observational)
  8. Consider feasibility and ethical constraints
  9. Choose between-subjects, within-subjects, or mixed designs
  10. Plan factorial designs if testing multiple factors

  11. Bias Minimization Strategy

  12. Implement randomization when possible
  13. Plan blinding at all feasible levels (participants, providers, assessors)
  14. Identify and plan to control confounds (randomization, matching, stratification, statistical adjustment)
  15. Standardize all procedures
  16. Plan to minimize attrition

  17. Sample Planning

  18. Conduct a priori power analysis (specify expected effect, desired power, alpha)
  19. Account for attrition in sample size
  20. Define clear inclusion/exclusion criteria
  21. Consider recruitment strategy and feasibility
  22. Plan for sample representativeness

  23. Measurement Strategy

  24. Select validated, reliable instruments
  25. Use objective measures when possible
  26. Plan multiple measures of key constructs (triangulation)
  27. Ensure measures are sensitive to expected changes
  28. Establish inter-rater reliability procedures

  29. Analysis Planning

  30. Prespecify all hypotheses and analyses
  31. Designate primary outcome clearly
  32. Plan statistical tests with assumption checks
  33. Specify how missing data will be handled
  34. Plan to report effect sizes and confidence intervals
  35. Consider multiple comparison corrections

  36. Transparency and Rigor

  37. Preregister study and analysis plan
  38. Use reporting guidelines (CONSORT, STROBE, PRISMA)
  39. Plan to report all outcomes, not just significant ones
  40. Distinguish confirmatory from exploratory analyses
  41. Commit to data/code sharing

Reference: See references/experimental_design.md for comprehensive design checklist covering all stages from question to dissemination.

7. Claim Evaluation

Systematically evaluate scientific claims for validity and support.

Apply when: - Assessing conclusions in papers - Evaluating media reports of research - Reviewing abstract or introduction claims - Checking if data support conclusions

Claim evaluation process:

  1. Identify the Claim
  2. What exactly is being claimed?
  3. Is it a causal claim, associational claim, or descriptive claim?
  4. How strong is the claim (proven, likely, suggested, possible)?

  5. Assess the Evidence

  6. What evidence is provided?
  7. Is evidence direct or indirect?
  8. Is evidence sufficient for the strength of claim?
  9. Are alternative explanations ruled out?

  10. Check Logical Connection

  11. Do conclusions follow from the data?
  12. Are there logical leaps?
  13. Is correlational data used to support causal claims?
  14. Are limitations acknowledged?

  15. Evaluate Proportionality

  16. Is confidence proportional to evidence strength?
  17. Are hedging words used appropriately?
  18. Are limitations downplayed?
  19. Is speculation clearly labeled?

  20. Check for Overgeneralization

  21. Do claims extend beyond the sample studied?
  22. Are population restrictions acknowledged?
  23. Is context-dependence recognized?
  24. Are caveats about generalization included?

  25. Red Flags

  26. Causal language from correlational studies
  27. "Proves" or absolute certainty
  28. Cherry-picked citations
  29. Ignoring contradictory evidence
  30. Dismissing limitations
  31. Extrapolation beyond data

Provide specific feedback: - Quote the problematic claim - Explain what evidence would be needed to support it - Suggest appropriate hedging language if warranted - Distinguish between data (what was found) and interpretation (what it means)

Application Guidelines

General Approach

  1. Be Constructive
  2. Identify strengths as well as weaknesses
  3. Suggest improvements rather than just criticizing
  4. Distinguish between fatal flaws and minor limitations
  5. Recognize that all research has limitations

  6. Be Specific

  7. Point to specific instances (e.g., "Table 2 shows..." or "In the Methods section...")
  8. Quote problematic statements
  9. Provide concrete examples of issues
  10. Reference specific principles or standards violated

  11. Be Proportionate

  12. Match criticism severity to issue importance
  13. Distinguish between major threats to validity and minor concerns
  14. Consider whether issues affect primary conclusions
  15. Acknowledge uncertainty in your own assessments

  16. Apply Consistent Standards

  17. Use same criteria across all studies
  18. Don't apply stricter standards to findings you dislike
  19. Acknowledge your own potential biases
  20. Base judgments on methodology, not results

  21. Consider Context

  22. Acknowledge practical and ethical constraints
  23. Consider field-specific norms for effect sizes and methods
  24. Recognize exploratory vs. confirmatory contexts
  25. Account for resource limitations in evaluating studies

When Providing Critique

Structure feedback as:

  1. Summary: Brief overview of what was evaluated
  2. Strengths: What was done well (important for credibility and learning)
  3. Concerns: Issues organized by severity
  4. Critical issues (threaten validity of main conclusions)
  5. Important issues (affect interpretation but not fatally)
  6. Minor issues (worth noting but don't change conclusions)
  7. Specific Recommendations: Actionable suggestions for improvement
  8. Overall Assessment: Balanced conclusion about evidence quality and what can be concluded

Use precise terminology: - Name specific biases, fallacies, and methodological issues - Reference established standards and guidelines - Cite principles from scientific methodology - Use technical terms accurately

When Uncertain

  • Acknowledge uncertainty: "This could be X or Y; additional information needed is Z"
  • Ask clarifying questions: "Was [methodological detail] done? This affects interpretation."
  • Provide conditional assessments: "If X was done, then Y follows; if not, then Z is concern"
  • Note what additional information would resolve uncertainty

Reference Materials

This skill includes comprehensive reference materials that provide detailed frameworks for critical evaluation:

  • references/scientific_method.md - Core principles of scientific methodology, the scientific process, critical evaluation criteria, red flags in scientific claims, causal inference standards, peer review, and open science principles

  • references/common_biases.md - Comprehensive taxonomy of cognitive, experimental, methodological, statistical, and analysis biases with detection and mitigation strategies

  • references/statistical_pitfalls.md - Common statistical errors and misinterpretations including p-value misunderstandings, multiple comparisons problems, sample size issues, effect size mistakes, correlation/causation confusion, regression pitfalls, and meta-analysis issues

  • references/evidence_hierarchy.md - Traditional evidence hierarchy, GRADE system, study quality assessment criteria, domain-specific considerations, evidence synthesis principles, and practical decision frameworks

  • references/logical_fallacies.md - Logical fallacies common in scientific discourse organized by type (causation, generalization, authority, relevance, structure, statistical) with examples and detection strategies

  • references/experimental_design.md - Comprehensive experimental design checklist covering research questions, hypotheses, study design selection, variables, sampling, blinding, randomization, control groups, procedures, measurement, bias minimization, data management, statistical planning, ethical considerations, validity threats, and reporting standards

When to consult references: - Load references into context when detailed frameworks are needed - Use grep to search references for specific topics: grep -r "pattern" references/ - References provide depth; SKILL.md provides procedural guidance - Consult references for comprehensive lists, detailed criteria, and specific examples

Remember

Scientific critical thinking is about: - Systematic evaluation using established principles - Constructive critique that improves science - Proportional confidence to evidence strength - Transparency about uncertainty and limitations - Consistent application of standards - Recognition that all research has limitations - Balance between skepticism and openness to evidence

Always distinguish between: - Data (what was observed) and interpretation (what it means) - Correlation and causation - Statistical significance and practical importance - Exploratory and confirmatory findings - What is known and what is uncertain - Evidence against a claim and evidence for the null

Goals of critical thinking: 1. Identify strengths and weaknesses accurately 2. Determine what conclusions are supported 3. Recognize limitations and uncertainties 4. Suggest improvements for future work 5. Advance scientific understanding

← Back to All Skills