19. AI Agents and Harnesses

Course Positioning

This course teaches the design of AI agents and the surrounding harnesses that make them reliable: tools, memory, planning, orchestration, evaluation, state management, permissions, logging, guardrails, and deployment. The focus is production-minded agent design rather than hype.

Learning outcomes

Explain the difference between a chatbot, workflow automation, tool-using assistant, and autonomous agent.
Design agent harnesses with tools, memory, state, policies, human approvals, and failure handling.
Implement simple agents that call APIs, retrieve information, write files, and coordinate steps safely.
Evaluate agents using task suites, traces, regression tests, and human review.
Build a deployable agent prototype with guardrails, monitoring, and documentation.

Expanded Topic-by-Topic Coverage

Module 1. Agent concepts and myths

Module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes. Primary live activity or lab: Classify example systems by autonomy and risk. Expected take-home output: Agent taxonomy map.

Topics and coverage

Chatbots

What it means: explain how Chatbots changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

chains

What it means: define chains clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

workflows

What it means: show where workflows appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.

planners

What it means: define planners clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

tool use

What it means: explain how tool use changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

autonomy levels

What it means: define autonomy levels clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

failure modes

What it means: define failure modes clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Classify example systems by autonomy and risk.
Learners produce: Agent taxonomy map.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 2. Harness architecture

Module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI. Primary live activity or lab: Draw a harness for a real task. Expected take-home output: Agent architecture diagram.

Topics and coverage

System prompt

What it means: explain how System prompt changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

state

What it means: define state clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

tools

What it means: define tools clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

memory

What it means: define memory clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

planner

What it means: define planner clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

executor

What it means: define executor clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

evaluator

What it means: define evaluator clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

guardrails

What it means: define guardrails clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

UI

What it means: define UI clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Draw a harness for a real task.
Learners produce: Agent architecture diagram.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 3. Tools and permissions

Module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege. Primary live activity or lab: Design a tool permission policy. Expected take-home output: Tool spec and policy.

Topics and coverage

Function calling

What it means: explain how Function calling changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

APIs

What it means: define APIs clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

databases

What it means: connect databases to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.

files

What it means: define files clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

browsers

What it means: define browsers clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

calendars

What it means: define calendars clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

email

What it means: define email clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

side effects

What it means: define side effects clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

least privilege

What it means in this course: define least privilege in operational terms, not as an abstract principle.
What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what engineers, technical founders, automation builders, advanced students, AI product teams must never delegate blindly to AI.
Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.

Practice and evidence of learning

Learners complete or discuss: Design a tool permission policy.
Learners produce: Tool spec and policy.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 4. Memory and state

Module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting. Primary live activity or lab: Choose memory design for three tasks. Expected take-home output: Memory design note.

Topics and coverage

Short-term context

What it means: define Short-term context clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

long-term memory

What it means: define long-term memory clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

vector stores

What it means: define vector stores clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

user profiles

What it means: define user profiles clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

privacy

What it means in this course: define privacy in operational terms, not as an abstract principle.
What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what engineers, technical founders, automation builders, advanced students, AI product teams must never delegate blindly to AI.
Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.

forgetting

What it means: define forgetting clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Choose memory design for three tasks.
Learners produce: Memory design note.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 5. Planning and control flow

Module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints. Primary live activity or lab: Build a simple plan-execute loop or workflow graph. Expected take-home output: Planner prototype.

Topics and coverage

ReAct

What it means: define ReAct clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

plan-act-reflect

What it means: define plan-act-reflect clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

workflows vs open agents

What it means: explain how workflows vs open agents changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

retries

What it means: define retries clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

termination

What it means: define termination clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

checkpoints

What it means: define checkpoints clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Build a simple plan-execute loop or workflow graph.
Learners produce: Planner prototype.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 6. RAG and knowledge grounding

Module focus: Document stores, retrieval, citations, freshness, prompt injection, access control. Primary live activity or lab: Connect an agent to a small document base. Expected take-home output: Grounded assistant.

Topics and coverage

Document stores

What it means: define Document stores clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

retrieval

What it means: explain how retrieval changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

citations

What it means: define citations clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

freshness

What it means: define freshness clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

prompt injection

What it means: explain how prompt injection changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

access control

What it means: define access control clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Connect an agent to a small document base.
Learners produce: Grounded assistant.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 7. Evaluation harnesses

Module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria. Primary live activity or lab: Create tests for an agent task. Expected take-home output: Agent eval suite.

Topics and coverage

Task suites

What it means: define Task suites clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

golden traces

What it means: define golden traces clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

simulated users

What it means: define simulated users clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

LLM judge limitations

What it means: explain how LLM judge limitations changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

pass/fail criteria

What it means: define pass/fail criteria clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Create tests for an agent task.
Learners produce: Agent eval suite.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 8. Human-in-the-loop design

Module focus: Approval gates, escalation, uncertainty, role-based review, audit logs. Primary live activity or lab: Add an approval step to a risky workflow. Expected take-home output: Approval workflow.

Topics and coverage

Approval gates

What it means: define Approval gates clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

escalation

What it means: define escalation clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

uncertainty

What it means: define uncertainty clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

role-based review

What it means: define role-based review clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

audit logs

What it means: define audit logs clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Add an approval step to a risky workflow.
Learners produce: Approval workflow.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 9. Reliability, security, and monitoring

Module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response. Primary live activity or lab: Red-team an agent with malicious instructions. Expected take-home output: Security test report.

Topics and coverage

Prompt injection

What it means: explain how Prompt injection changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

data leaks

What it means: connect data leaks to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.

rate limits

What it means: define rate limits clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

observability

What it means: place observability inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

rollbacks

What it means: define rollbacks clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

incident response

What it means: define incident response clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Red-team an agent with malicious instructions.
Learners produce: Security test report.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 10. Deployment studio

Module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation. Primary live activity or lab: Build and present final agent. Expected take-home output: Deployed or demo-ready agent.

Topics and coverage

Packaging

What it means: define Packaging clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

API/UI

What it means: define API/UI clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

auth

What it means: define auth clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

logging

What it means: define logging clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

costs

What it means: place costs inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

maintenance

What it means: define maintenance clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

documentation

What it means: define documentation clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Build and present final agent.
Learners produce: Deployed or demo-ready agent.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Labs, projects, and assessments

Lab 1: Build a tool-calling assistant with two safe tools and structured output.
Lab 2: Add retrieval over a controlled document set and test citation quality.
Lab 3: Create an evaluation suite with normal, edge-case, and adversarial tasks.
Lab 4: Add human approval for any external side effect.
Capstone: Agent prototype with architecture diagram, tool policy, eval results, red-team notes, and deployment plan.

Evaluation approach

15% architecture and tool design.
25% implementation labs.
20% evaluation harness.
20% security and governance review.
20% final agent demo and documentation.

Recommended tools and materials

Python/TypeScript, LLM APIs or local models, LangGraph/CrewAI/AutoGen or custom lightweight framework, vector store, FastAPI/Next.js/Streamlit optional, GitHub, Docker optional.
Optional: browser automation sandbox, queue system, logging/tracing tool.

Safety, ethics, and governance emphasis

Agents with real-world side effects need explicit user confirmation, logging, and rollback plans.
Use least-privilege tool access and isolate credentials.
Test for prompt injection, tool misuse, infinite loops, data exfiltration, and unsafe escalation.

Delivery notes

Teach deterministic workflows first, then controlled autonomy.
Require demos to show failure handling, not just success paths.
Appendix A: Portfolio artifacts by course.
Appendix B: Universal AI-use disclosure template.
Learners should submit this short disclosure with major assignments, projects, or professional outputs:.
What AI tool(s) did I use?
What did I ask the tool to do?
Which parts of the final output were produced or significantly shaped by AI?
What did I verify independently?
What decisions did I make myself?
What limitations or risks remain?
Appendix C: Standard verification checklist.
Check factual claims against primary or trusted sources.
Check numbers, dates, names, citations, regulations, prices, and medical/legal/financial statements manually.
Ask the model for uncertainty, assumptions, and possible counterarguments.
Use source hierarchy: primary sources first, then reputable secondary sources, then general commentary.
For professional outputs, use a qualified human reviewer before client, patient, employee, or public use.
Document the review process for high-stakes work.
Appendix D: Delivery formats.
Appendix E: Trainer requirements.
Appendix F: Reference guidance informing the curriculum.
The curriculum structure is aligned with current public guidance and trends around AI literacy, effective educational use, risk management, and domain governance. Recommended references for curriculum owners:.
UNESCO work on AI in education and AI competency frameworks for teachers and students.
OECD Digital Education Outlook 2026 on effective uses of generative AI in education.
Stanford HAI AI Index reports for tracking the broader AI landscape, adoption, technical progress, and education trends.
NIST AI Risk Management Framework and Generative AI Profile for governance and risk language.
WHO guidance on ethics and governance of large multimodal models in health for healthcare-specific safety framing.

Instructor Build Checklist

Prepare one short demo for each module and one learner activity that creates a saved artifact.
Prepare examples that match the audience, local context, and likely tools learners can access.
Add a verification step to every AI-generated output: factual check, source check, data sensitivity check, and quality review.
Keep a running portfolio folder so each module contributes to the final project or learner playbook.
Reserve time for reflection on what the learner did, what AI did, what was checked, and what remains uncertain.