19. AI Agents and Harnesses
Course Positioning
This course teaches the design of AI agents and the surrounding harnesses that make them reliable: tools, memory, planning, orchestration, evaluation, state management, permissions, logging, guardrails, and deployment. The focus is production-minded agent design rather than hype.
Learning outcomes
- Explain the difference between a chatbot, workflow automation, tool-using assistant, and autonomous agent.
- Design agent harnesses with tools, memory, state, policies, human approvals, and failure handling.
- Implement simple agents that call APIs, retrieve information, write files, and coordinate steps safely.
- Evaluate agents using task suites, traces, regression tests, and human review.
- Build a deployable agent prototype with guardrails, monitoring, and documentation.
Expanded Topic-by-Topic Coverage
Module 1. Agent concepts and myths
Module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes. Primary live activity or lab: Classify example systems by autonomy and risk. Expected take-home output: Agent taxonomy map.
Topics and coverage
Chatbots
- What it means: explain how Chatbots changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
chains
- What it means: define chains clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
workflows
- What it means: show where workflows appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
- What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
- Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
- Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.
planners
- What it means: define planners clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
tool use
- What it means: explain how tool use changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
autonomy levels
- What it means: define autonomy levels clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
failure modes
- What it means: define failure modes clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Classify example systems by autonomy and risk.
- Learners produce: Agent taxonomy map.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 2. Harness architecture
Module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI. Primary live activity or lab: Draw a harness for a real task. Expected take-home output: Agent architecture diagram.
Topics and coverage
System prompt
- What it means: explain how System prompt changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
state
- What it means: define state clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
tools
- What it means: define tools clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
memory
- What it means: define memory clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
planner
- What it means: define planner clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
executor
- What it means: define executor clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
evaluator
- What it means: define evaluator clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
guardrails
- What it means: define guardrails clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
UI
- What it means: define UI clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Draw a harness for a real task.
- Learners produce: Agent architecture diagram.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 3. Tools and permissions
Module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege. Primary live activity or lab: Design a tool permission policy. Expected take-home output: Tool spec and policy.
Topics and coverage
Function calling
- What it means: explain how Function calling changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
APIs
- What it means: define APIs clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
databases
- What it means: connect databases to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
files
- What it means: define files clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
browsers
- What it means: define browsers clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
calendars
- What it means: define calendars clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
- What it means: define email clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
side effects
- What it means: define side effects clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
least privilege
- What it means in this course: define least privilege in operational terms, not as an abstract principle.
- What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what engineers, technical founders, automation builders, advanced students, AI product teams must never delegate blindly to AI.
- Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
- Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.
Practice and evidence of learning
- Learners complete or discuss: Design a tool permission policy.
- Learners produce: Tool spec and policy.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 4. Memory and state
Module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting. Primary live activity or lab: Choose memory design for three tasks. Expected take-home output: Memory design note.
Topics and coverage
Short-term context
- What it means: define Short-term context clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
long-term memory
- What it means: define long-term memory clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
vector stores
- What it means: define vector stores clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
user profiles
- What it means: define user profiles clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
privacy
- What it means in this course: define privacy in operational terms, not as an abstract principle.
- What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what engineers, technical founders, automation builders, advanced students, AI product teams must never delegate blindly to AI.
- Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
- Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.
forgetting
- What it means: define forgetting clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Choose memory design for three tasks.
- Learners produce: Memory design note.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 5. Planning and control flow
Module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints. Primary live activity or lab: Build a simple plan-execute loop or workflow graph. Expected take-home output: Planner prototype.
Topics and coverage
ReAct
- What it means: define ReAct clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
plan-act-reflect
- What it means: define plan-act-reflect clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
workflows vs open agents
- What it means: explain how workflows vs open agents changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
retries
- What it means: define retries clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
termination
- What it means: define termination clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
checkpoints
- What it means: define checkpoints clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Build a simple plan-execute loop or workflow graph.
- Learners produce: Planner prototype.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 6. RAG and knowledge grounding
Module focus: Document stores, retrieval, citations, freshness, prompt injection, access control. Primary live activity or lab: Connect an agent to a small document base. Expected take-home output: Grounded assistant.
Topics and coverage
Document stores
- What it means: define Document stores clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
retrieval
- What it means: explain how retrieval changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
citations
- What it means: define citations clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
freshness
- What it means: define freshness clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
prompt injection
- What it means: explain how prompt injection changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
access control
- What it means: define access control clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Connect an agent to a small document base.
- Learners produce: Grounded assistant.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 7. Evaluation harnesses
Module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria. Primary live activity or lab: Create tests for an agent task. Expected take-home output: Agent eval suite.
Topics and coverage
Task suites
- What it means: define Task suites clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
golden traces
- What it means: define golden traces clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
simulated users
- What it means: define simulated users clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
LLM judge limitations
- What it means: explain how LLM judge limitations changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
pass/fail criteria
- What it means: define pass/fail criteria clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Create tests for an agent task.
- Learners produce: Agent eval suite.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 8. Human-in-the-loop design
Module focus: Approval gates, escalation, uncertainty, role-based review, audit logs. Primary live activity or lab: Add an approval step to a risky workflow. Expected take-home output: Approval workflow.
Topics and coverage
Approval gates
- What it means: define Approval gates clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
escalation
- What it means: define escalation clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
uncertainty
- What it means: define uncertainty clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
role-based review
- What it means: define role-based review clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
audit logs
- What it means: define audit logs clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Add an approval step to a risky workflow.
- Learners produce: Approval workflow.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 9. Reliability, security, and monitoring
Module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response. Primary live activity or lab: Red-team an agent with malicious instructions. Expected take-home output: Security test report.
Topics and coverage
Prompt injection
- What it means: explain how Prompt injection changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
data leaks
- What it means: connect data leaks to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
rate limits
- What it means: define rate limits clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
observability
- What it means: place observability inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
rollbacks
- What it means: define rollbacks clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
incident response
- What it means: define incident response clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Red-team an agent with malicious instructions.
- Learners produce: Security test report.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 10. Deployment studio
Module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation. Primary live activity or lab: Build and present final agent. Expected take-home output: Deployed or demo-ready agent.
Topics and coverage
Packaging
- What it means: define Packaging clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
API/UI
- What it means: define API/UI clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
auth
- What it means: define auth clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
logging
- What it means: define logging clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
costs
- What it means: place costs inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
maintenance
- What it means: define maintenance clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
documentation
- What it means: define documentation clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Build and present final agent.
- Learners produce: Deployed or demo-ready agent.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Labs, projects, and assessments
- Lab 1: Build a tool-calling assistant with two safe tools and structured output.
- Lab 2: Add retrieval over a controlled document set and test citation quality.
- Lab 3: Create an evaluation suite with normal, edge-case, and adversarial tasks.
- Lab 4: Add human approval for any external side effect.
- Capstone: Agent prototype with architecture diagram, tool policy, eval results, red-team notes, and deployment plan.
Evaluation approach
- 15% architecture and tool design.
- 25% implementation labs.
- 20% evaluation harness.
- 20% security and governance review.
- 20% final agent demo and documentation.
Recommended tools and materials
- Python/TypeScript, LLM APIs or local models, LangGraph/CrewAI/AutoGen or custom lightweight framework, vector store, FastAPI/Next.js/Streamlit optional, GitHub, Docker optional.
- Optional: browser automation sandbox, queue system, logging/tracing tool.
Safety, ethics, and governance emphasis
- Agents with real-world side effects need explicit user confirmation, logging, and rollback plans.
- Use least-privilege tool access and isolate credentials.
- Test for prompt injection, tool misuse, infinite loops, data exfiltration, and unsafe escalation.
Delivery notes
- Teach deterministic workflows first, then controlled autonomy.
- Require demos to show failure handling, not just success paths.
- Appendix A: Portfolio artifacts by course.
- Appendix B: Universal AI-use disclosure template.
- Learners should submit this short disclosure with major assignments, projects, or professional outputs:.
- What AI tool(s) did I use?
- What did I ask the tool to do?
- Which parts of the final output were produced or significantly shaped by AI?
- What did I verify independently?
- What decisions did I make myself?
- What limitations or risks remain?
- Appendix C: Standard verification checklist.
- Check factual claims against primary or trusted sources.
- Check numbers, dates, names, citations, regulations, prices, and medical/legal/financial statements manually.
- Ask the model for uncertainty, assumptions, and possible counterarguments.
- Use source hierarchy: primary sources first, then reputable secondary sources, then general commentary.
- For professional outputs, use a qualified human reviewer before client, patient, employee, or public use.
- Document the review process for high-stakes work.
- Appendix D: Delivery formats.
- Appendix E: Trainer requirements.
- Appendix F: Reference guidance informing the curriculum.
- The curriculum structure is aligned with current public guidance and trends around AI literacy, effective educational use, risk management, and domain governance. Recommended references for curriculum owners:.
- UNESCO work on AI in education and AI competency frameworks for teachers and students.
- OECD Digital Education Outlook 2026 on effective uses of generative AI in education.
- Stanford HAI AI Index reports for tracking the broader AI landscape, adoption, technical progress, and education trends.
- NIST AI Risk Management Framework and Generative AI Profile for governance and risk language.
- WHO guidance on ethics and governance of large multimodal models in health for healthcare-specific safety framing.
Instructor Build Checklist
- Prepare one short demo for each module and one learner activity that creates a saved artifact.
- Prepare examples that match the audience, local context, and likely tools learners can access.
- Add a verification step to every AI-generated output: factual check, source check, data sensitivity check, and quality review.
- Keep a running portfolio folder so each module contributes to the final project or learner playbook.
- Reserve time for reflection on what the learner did, what AI did, what was checked, and what remains uncertain.