18. AI for Science
Course Positioning
This course teaches how AI can support the scientific workflow: literature review, hypothesis generation, data analysis, simulation, lab/field experiment planning, surrogate modeling, Bayesian optimization, scientific agents, reproducibility, and responsible research. It is designed for AI-for-science builders and domain scientists, not just general AI users.
Learning outcomes
- Map scientific workflows into AI-assistable components: literature, data, models, simulations, experiments, and manuscripts.
- Use LLMs for structured literature review, hypothesis generation, protocol drafting, and scientific critique with verification.
- Apply AI/ML concepts such as embeddings, surrogate models, active learning, Bayesian optimization, and simulation-in-the-loop workflows.
- Design evaluation metrics and oracle/feedback loops for scientific discovery systems.
- Build a small AI-for-science project proposal or prototype with reproducibility and ethics plan.
Expanded Topic-by-Topic Coverage
Module 1. The AI-for-science landscape
Module focus: Foundation models for science, lab automation, climate, biology, materials, medicine, mathematics, scientific agents. Primary live activity or lab: Map one research area to AI opportunity types. Expected take-home output: AI-for-science opportunity map.
Topics and coverage
Foundation models for science
- What it means: explain how Foundation models for science changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
lab automation
- What it means: define lab automation clearly and connect it to the module focus: Foundation models for science, lab automation, climate, biology, materials, medicine, mathematics, scientific agents.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
climate
- What it means: define climate clearly and connect it to the module focus: Foundation models for science, lab automation, climate, biology, materials, medicine, mathematics, scientific agents.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
biology
- What it means: define biology clearly and connect it to the module focus: Foundation models for science, lab automation, climate, biology, materials, medicine, mathematics, scientific agents.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
materials
- What it means: define materials clearly and connect it to the module focus: Foundation models for science, lab automation, climate, biology, materials, medicine, mathematics, scientific agents.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
medicine
- What it means: define medicine clearly and connect it to the module focus: Foundation models for science, lab automation, climate, biology, materials, medicine, mathematics, scientific agents.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
mathematics
- What it means: define mathematics clearly and connect it to the module focus: Foundation models for science, lab automation, climate, biology, materials, medicine, mathematics, scientific agents.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
scientific agents
- What it means: explain how scientific agents changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
Practice and evidence of learning
- Learners complete or discuss: Map one research area to AI opportunity types.
- Learners produce: AI-for-science opportunity map.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 2. Scientific literature workflows
Module focus: Search, screening, extraction, citation graphs, claims, contradictions, evidence tables. Primary live activity or lab: Build a structured literature matrix. Expected take-home output: Literature evidence table.
Topics and coverage
Search
- What it means: define Search clearly and connect it to the module focus: Search, screening, extraction, citation graphs, claims, contradictions, evidence tables.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
screening
- What it means: define screening clearly and connect it to the module focus: Search, screening, extraction, citation graphs, claims, contradictions, evidence tables.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
extraction
- What it means: define extraction clearly and connect it to the module focus: Search, screening, extraction, citation graphs, claims, contradictions, evidence tables.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
citation graphs
- What it means: define citation graphs clearly and connect it to the module focus: Search, screening, extraction, citation graphs, claims, contradictions, evidence tables.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
claims
- What it means: define claims clearly and connect it to the module focus: Search, screening, extraction, citation graphs, claims, contradictions, evidence tables.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
contradictions
- What it means: define contradictions clearly and connect it to the module focus: Search, screening, extraction, citation graphs, claims, contradictions, evidence tables.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
evidence tables
- What it means: define evidence tables clearly and connect it to the module focus: Search, screening, extraction, citation graphs, claims, contradictions, evidence tables.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Build a structured literature matrix.
- Learners produce: Literature evidence table.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 3. Hypothesis generation and critique
Module focus: LLM brainstorming, mechanistic reasoning, falsifiability, novelty, confounders, failure modes. Primary live activity or lab: Generate hypotheses and critique them against evidence. Expected take-home output: Hypothesis shortlist.
Topics and coverage
LLM brainstorming
- What it means: explain how LLM brainstorming changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
mechanistic reasoning
- What it means: define mechanistic reasoning clearly and connect it to the module focus: LLM brainstorming, mechanistic reasoning, falsifiability, novelty, confounders, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
falsifiability
- What it means: define falsifiability clearly and connect it to the module focus: LLM brainstorming, mechanistic reasoning, falsifiability, novelty, confounders, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
novelty
- What it means: define novelty clearly and connect it to the module focus: LLM brainstorming, mechanistic reasoning, falsifiability, novelty, confounders, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
confounders
- What it means: define confounders clearly and connect it to the module focus: LLM brainstorming, mechanistic reasoning, falsifiability, novelty, confounders, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
failure modes
- What it means: define failure modes clearly and connect it to the module focus: LLM brainstorming, mechanistic reasoning, falsifiability, novelty, confounders, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Generate hypotheses and critique them against evidence.
- Learners produce: Hypothesis shortlist.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 4. Data and representation
Module focus: Scientific datasets, metadata, embeddings, ontologies, features, leakage, provenance. Primary live activity or lab: Design a data schema for a scientific problem. Expected take-home output: Data card.
Topics and coverage
Scientific datasets
- What it means: connect Scientific datasets to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
metadata
- What it means: connect metadata to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
embeddings
- What it means: explain how embeddings changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
ontologies
- What it means: define ontologies clearly and connect it to the module focus: Scientific datasets, metadata, embeddings, ontologies, features, leakage, provenance.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
features
- What it means: connect features to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
leakage
- What it means: define leakage clearly and connect it to the module focus: Scientific datasets, metadata, embeddings, ontologies, features, leakage, provenance.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
provenance
- What it means: define provenance clearly and connect it to the module focus: Scientific datasets, metadata, embeddings, ontologies, features, leakage, provenance.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Design a data schema for a scientific problem.
- Learners produce: Data card.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 5. Models, simulators, and surrogates
Module focus: Physical simulators, ML surrogates, uncertainty, calibration, validation, multi-objective problems. Primary live activity or lab: Sketch simulator + surrogate workflow. Expected take-home output: Modeling plan.
Topics and coverage
Physical simulators
- What it means: define Physical simulators clearly and connect it to the module focus: Physical simulators, ML surrogates, uncertainty, calibration, validation, multi-objective problems.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
ML surrogates
- What it means: define ML surrogates clearly and connect it to the module focus: Physical simulators, ML surrogates, uncertainty, calibration, validation, multi-objective problems.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
uncertainty
- What it means: define uncertainty clearly and connect it to the module focus: Physical simulators, ML surrogates, uncertainty, calibration, validation, multi-objective problems.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
calibration
- What it means: define calibration clearly and connect it to the module focus: Physical simulators, ML surrogates, uncertainty, calibration, validation, multi-objective problems.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
validation
- What it means: define validation clearly and connect it to the module focus: Physical simulators, ML surrogates, uncertainty, calibration, validation, multi-objective problems.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
multi-objective problems
- What it means: define multi-objective problems clearly and connect it to the module focus: Physical simulators, ML surrogates, uncertainty, calibration, validation, multi-objective problems.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Sketch simulator + surrogate workflow.
- Learners produce: Modeling plan.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 6. Active learning and Bayesian optimization
Module focus: Acquisition functions, exploration/exploitation, expensive experiments, constraints, batch design. Primary live activity or lab: Design an active-learning loop for a fictional experiment. Expected take-home output: BO loop diagram.
Topics and coverage
Acquisition functions
- What it means: define Acquisition functions clearly and connect it to the module focus: Acquisition functions, exploration/exploitation, expensive experiments, constraints, batch design.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
exploration/exploitation
- What it means: define exploration/exploitation clearly and connect it to the module focus: Acquisition functions, exploration/exploitation, expensive experiments, constraints, batch design.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
expensive experiments
- What it means: define expensive experiments clearly and connect it to the module focus: Acquisition functions, exploration/exploitation, expensive experiments, constraints, batch design.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
constraints
- What it means: define constraints clearly and connect it to the module focus: Acquisition functions, exploration/exploitation, expensive experiments, constraints, batch design.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
batch design
- What it means: show where batch design appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
- What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
- Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
- Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.
Practice and evidence of learning
- Learners complete or discuss: Design an active-learning loop for a fictional experiment.
- Learners produce: BO loop diagram.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 7. Scientific agents and tools
Module focus: Tool-using agents, literature tools, code tools, databases, experiment planning, lab notebooks. Primary live activity or lab: Design an agent harness for a scientific task. Expected take-home output: Scientific agent spec.
Topics and coverage
Tool-using agents
- What it means: explain how Tool-using agents changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
literature tools
- What it means: define literature tools clearly and connect it to the module focus: Tool-using agents, literature tools, code tools, databases, experiment planning, lab notebooks.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
code tools
- What it means: define code tools clearly and connect it to the module focus: Tool-using agents, literature tools, code tools, databases, experiment planning, lab notebooks.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
databases
- What it means: connect databases to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
experiment planning
- What it means: define experiment planning clearly and connect it to the module focus: Tool-using agents, literature tools, code tools, databases, experiment planning, lab notebooks.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
lab notebooks
- What it means: define lab notebooks clearly and connect it to the module focus: Tool-using agents, literature tools, code tools, databases, experiment planning, lab notebooks.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Design an agent harness for a scientific task.
- Learners produce: Scientific agent spec.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 8. Evaluation and reproducibility
Module focus: Benchmarks, held-out tests, baselines, ablations, uncertainty, reproducible pipelines, data/version control. Primary live activity or lab: Create an evaluation plan for a proposed system. Expected take-home output: Evaluation checklist.
Topics and coverage
Benchmarks
- What it means: define Benchmarks clearly and connect it to the module focus: Benchmarks, held-out tests, baselines, ablations, uncertainty, reproducible pipelines, data/version control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
held-out tests
- What it means: define held-out tests clearly and connect it to the module focus: Benchmarks, held-out tests, baselines, ablations, uncertainty, reproducible pipelines, data/version control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
baselines
- What it means: define baselines clearly and connect it to the module focus: Benchmarks, held-out tests, baselines, ablations, uncertainty, reproducible pipelines, data/version control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
ablations
- What it means: define ablations clearly and connect it to the module focus: Benchmarks, held-out tests, baselines, ablations, uncertainty, reproducible pipelines, data/version control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
uncertainty
- What it means: define uncertainty clearly and connect it to the module focus: Benchmarks, held-out tests, baselines, ablations, uncertainty, reproducible pipelines, data/version control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
reproducible pipelines
- What it means: define reproducible pipelines clearly and connect it to the module focus: Benchmarks, held-out tests, baselines, ablations, uncertainty, reproducible pipelines, data/version control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
data/version control
- What it means: connect data/version control to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
Practice and evidence of learning
- Learners complete or discuss: Create an evaluation plan for a proposed system.
- Learners produce: Evaluation checklist.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 9. Ethics, safety, and dual-use
Module focus: Biosecurity, research integrity, data rights, environmental impact, overclaiming, responsible release. Primary live activity or lab: Risk assessment for an AI-for-science project. Expected take-home output: Responsible research note.
Topics and coverage
Biosecurity
- What it means in this course: define Biosecurity in operational terms, not as an abstract principle.
- What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what researchers, graduate students, postdocs, R&D teams, scientific software builders must never delegate blindly to AI.
- Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
- Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.
research integrity
- What it means: show where research integrity appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
- What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
- Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
- Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.
data rights
- What it means in this course: define data rights in operational terms, not as an abstract principle.
- What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what researchers, graduate students, postdocs, R&D teams, scientific software builders must never delegate blindly to AI.
- Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
- Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.
environmental impact
- What it means: define environmental impact clearly and connect it to the module focus: Biosecurity, research integrity, data rights, environmental impact, overclaiming, responsible release.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
overclaiming
- What it means: define overclaiming clearly and connect it to the module focus: Biosecurity, research integrity, data rights, environmental impact, overclaiming, responsible release.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
responsible release
- What it means: define responsible release clearly and connect it to the module focus: Biosecurity, research integrity, data rights, environmental impact, overclaiming, responsible release.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Risk assessment for an AI-for-science project.
- Learners produce: Responsible research note.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 10. Capstone proposal/prototype
Module focus: Problem, data, models, oracle, loop, metrics, risks, roadmap. Primary live activity or lab: Present an AI-for-science project. Expected take-home output: Proposal or prototype.
Topics and coverage
Problem
- What it means: define Problem clearly and connect it to the module focus: Problem, data, models, oracle, loop, metrics, risks, roadmap.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
data
- What it means: connect data to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
models
- What it means: place models inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
oracle
- What it means: define oracle clearly and connect it to the module focus: Problem, data, models, oracle, loop, metrics, risks, roadmap.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
loop
- What it means: define loop clearly and connect it to the module focus: Problem, data, models, oracle, loop, metrics, risks, roadmap.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
metrics
- What it means: define metrics clearly and connect it to the module focus: Problem, data, models, oracle, loop, metrics, risks, roadmap.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
risks
- What it means in this course: define risks in operational terms, not as an abstract principle.
- What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what researchers, graduate students, postdocs, R&D teams, scientific software builders must never delegate blindly to AI.
- Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
- Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.
roadmap
- What it means: define roadmap clearly and connect it to the module focus: Problem, data, models, oracle, loop, metrics, risks, roadmap.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Present an AI-for-science project.
- Learners produce: Proposal or prototype.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Labs, projects, and assessments
- Lab 1: Literature matrix with claims, evidence strength, contradictions, and open questions.
- Lab 2: Design a closed-loop scientific discovery workflow with data, model, oracle, and experiment/simulation feedback.
- Lab 3: Build a small notebook prototype or no-code workflow for scientific extraction, analysis, or optimization.
- Capstone: AI-for-science mini-proposal or prototype with evaluation, reproducibility, and responsible research plan.
Evaluation approach
- 20% literature and evidence matrix.
- 20% hypothesis and critique exercise.
- 20% active learning or simulator workflow.
- 20% evaluation/reproducibility plan.
- 20% capstone proposal or prototype.
Recommended tools and materials
- AI assistant, literature databases, Zotero, Python notebooks, domain datasets, Git, experiment tracking, optional BO libraries and vector search.
- Optional: domain-specific foundation models, simulation tools, and lab information systems.
Safety, ethics, and governance emphasis
- Scientific claims generated by AI must be checked against primary literature or experiments.
- Avoid unsafe biological, chemical, clinical, or dual-use operational instructions.
- Require transparent documentation of data provenance, prompts, code, model versions, and limitations.
Delivery notes
- Adapt examples to the audience: biology, climate, materials, agriculture, medicine, economics, or math.
- This course can become a proposal-writing incubator for research grants.
Instructor Build Checklist
- Prepare one short demo for each module and one learner activity that creates a saved artifact.
- Prepare examples that match the audience, local context, and likely tools learners can access.
- Add a verification step to every AI-generated output: factual check, source check, data sensitivity check, and quality review.
- Keep a running portfolio folder so each module contributes to the final project or learner playbook.
- Reserve time for reflection on what the learner did, what AI did, what was checked, and what remains uncertain.