AI AI EducationCurriculum Library
All courses

AI Curriculum

19. AI Agents and Harnesses

Audienceengineers, technical founders, automation builders, advanced students, AI product teams
Duration24-40 hours
Modules10

19. AI Agents and Harnesses

Course Positioning

This course teaches the design of AI agents and the surrounding harnesses that make them reliable: tools, memory, planning, orchestration, evaluation, state management, permissions, logging, guardrails, and deployment. The focus is production-minded agent design rather than hype.

Learning outcomes

  • Explain the difference between a chatbot, workflow automation, tool-using assistant, and autonomous agent.
  • Design agent harnesses with tools, memory, state, policies, human approvals, and failure handling.
  • Implement simple agents that call APIs, retrieve information, write files, and coordinate steps safely.
  • Evaluate agents using task suites, traces, regression tests, and human review.
  • Build a deployable agent prototype with guardrails, monitoring, and documentation.

Expanded Topic-by-Topic Coverage

Module 1. Agent concepts and myths

Module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes. Primary live activity or lab: Classify example systems by autonomy and risk. Expected take-home output: Agent taxonomy map.

Topics and coverage

Chatbots

  • What it means: explain how Chatbots changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

chains

  • What it means: define chains clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

workflows

  • What it means: show where workflows appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
  • What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
  • Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
  • Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.

planners

  • What it means: define planners clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

tool use

  • What it means: explain how tool use changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

autonomy levels

  • What it means: define autonomy levels clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

failure modes

  • What it means: define failure modes clearly and connect it to the module focus: Chatbots, chains, workflows, planners, tool use, autonomy levels, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Classify example systems by autonomy and risk.
  • Learners produce: Agent taxonomy map.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 2. Harness architecture

Module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI. Primary live activity or lab: Draw a harness for a real task. Expected take-home output: Agent architecture diagram.

Topics and coverage

System prompt

  • What it means: explain how System prompt changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

state

  • What it means: define state clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

tools

  • What it means: define tools clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

memory

  • What it means: define memory clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

planner

  • What it means: define planner clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

executor

  • What it means: define executor clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

evaluator

  • What it means: define evaluator clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

guardrails

  • What it means: define guardrails clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

UI

  • What it means: define UI clearly and connect it to the module focus: System prompt, state, tools, memory, planner, executor, evaluator, guardrails, UI.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Draw a harness for a real task.
  • Learners produce: Agent architecture diagram.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 3. Tools and permissions

Module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege. Primary live activity or lab: Design a tool permission policy. Expected take-home output: Tool spec and policy.

Topics and coverage

Function calling

  • What it means: explain how Function calling changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

APIs

  • What it means: define APIs clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

databases

  • What it means: connect databases to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
  • What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
  • Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
  • Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.

files

  • What it means: define files clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

browsers

  • What it means: define browsers clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

calendars

  • What it means: define calendars clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

email

  • What it means: define email clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

side effects

  • What it means: define side effects clearly and connect it to the module focus: Function calling, APIs, databases, files, browsers, calendars, email, side effects, least privilege.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

least privilege

  • What it means in this course: define least privilege in operational terms, not as an abstract principle.
  • What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what engineers, technical founders, automation builders, advanced students, AI product teams must never delegate blindly to AI.
  • Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
  • Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.

Practice and evidence of learning

  • Learners complete or discuss: Design a tool permission policy.
  • Learners produce: Tool spec and policy.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 4. Memory and state

Module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting. Primary live activity or lab: Choose memory design for three tasks. Expected take-home output: Memory design note.

Topics and coverage

Short-term context

  • What it means: define Short-term context clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

long-term memory

  • What it means: define long-term memory clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

vector stores

  • What it means: define vector stores clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

user profiles

  • What it means: define user profiles clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

privacy

  • What it means in this course: define privacy in operational terms, not as an abstract principle.
  • What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what engineers, technical founders, automation builders, advanced students, AI product teams must never delegate blindly to AI.
  • Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
  • Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.

forgetting

  • What it means: define forgetting clearly and connect it to the module focus: Short-term context, long-term memory, vector stores, user profiles, privacy, forgetting.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Choose memory design for three tasks.
  • Learners produce: Memory design note.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 5. Planning and control flow

Module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints. Primary live activity or lab: Build a simple plan-execute loop or workflow graph. Expected take-home output: Planner prototype.

Topics and coverage

ReAct

  • What it means: define ReAct clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

plan-act-reflect

  • What it means: define plan-act-reflect clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

workflows vs open agents

  • What it means: explain how workflows vs open agents changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

retries

  • What it means: define retries clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

termination

  • What it means: define termination clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

checkpoints

  • What it means: define checkpoints clearly and connect it to the module focus: ReAct, plan-act-reflect, workflows vs open agents, retries, termination, checkpoints.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Build a simple plan-execute loop or workflow graph.
  • Learners produce: Planner prototype.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 6. RAG and knowledge grounding

Module focus: Document stores, retrieval, citations, freshness, prompt injection, access control. Primary live activity or lab: Connect an agent to a small document base. Expected take-home output: Grounded assistant.

Topics and coverage

Document stores

  • What it means: define Document stores clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

retrieval

  • What it means: explain how retrieval changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

citations

  • What it means: define citations clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

freshness

  • What it means: define freshness clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

prompt injection

  • What it means: explain how prompt injection changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

access control

  • What it means: define access control clearly and connect it to the module focus: Document stores, retrieval, citations, freshness, prompt injection, access control.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Connect an agent to a small document base.
  • Learners produce: Grounded assistant.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 7. Evaluation harnesses

Module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria. Primary live activity or lab: Create tests for an agent task. Expected take-home output: Agent eval suite.

Topics and coverage

Task suites

  • What it means: define Task suites clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

golden traces

  • What it means: define golden traces clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

simulated users

  • What it means: define simulated users clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

LLM judge limitations

  • What it means: explain how LLM judge limitations changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

pass/fail criteria

  • What it means: define pass/fail criteria clearly and connect it to the module focus: Task suites, golden traces, simulated users, LLM judge limitations, pass/fail criteria.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Create tests for an agent task.
  • Learners produce: Agent eval suite.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 8. Human-in-the-loop design

Module focus: Approval gates, escalation, uncertainty, role-based review, audit logs. Primary live activity or lab: Add an approval step to a risky workflow. Expected take-home output: Approval workflow.

Topics and coverage

Approval gates

  • What it means: define Approval gates clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

escalation

  • What it means: define escalation clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

uncertainty

  • What it means: define uncertainty clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

role-based review

  • What it means: define role-based review clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

audit logs

  • What it means: define audit logs clearly and connect it to the module focus: Approval gates, escalation, uncertainty, role-based review, audit logs.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Add an approval step to a risky workflow.
  • Learners produce: Approval workflow.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 9. Reliability, security, and monitoring

Module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response. Primary live activity or lab: Red-team an agent with malicious instructions. Expected take-home output: Security test report.

Topics and coverage

Prompt injection

  • What it means: explain how Prompt injection changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

data leaks

  • What it means: connect data leaks to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
  • What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
  • Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
  • Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.

rate limits

  • What it means: define rate limits clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

observability

  • What it means: place observability inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
  • What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
  • Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
  • Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

rollbacks

  • What it means: define rollbacks clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

incident response

  • What it means: define incident response clearly and connect it to the module focus: Prompt injection, data leaks, rate limits, observability, rollbacks, incident response.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Red-team an agent with malicious instructions.
  • Learners produce: Security test report.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 10. Deployment studio

Module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation. Primary live activity or lab: Build and present final agent. Expected take-home output: Deployed or demo-ready agent.

Topics and coverage

Packaging

  • What it means: define Packaging clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

API/UI

  • What it means: define API/UI clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

auth

  • What it means: define auth clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

logging

  • What it means: define logging clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

costs

  • What it means: place costs inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
  • What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
  • Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
  • Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

maintenance

  • What it means: define maintenance clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

documentation

  • What it means: define documentation clearly and connect it to the module focus: Packaging, API/UI, auth, logging, costs, maintenance, documentation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Build and present final agent.
  • Learners produce: Deployed or demo-ready agent.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Labs, projects, and assessments

  • Lab 1: Build a tool-calling assistant with two safe tools and structured output.
  • Lab 2: Add retrieval over a controlled document set and test citation quality.
  • Lab 3: Create an evaluation suite with normal, edge-case, and adversarial tasks.
  • Lab 4: Add human approval for any external side effect.
  • Capstone: Agent prototype with architecture diagram, tool policy, eval results, red-team notes, and deployment plan.

Evaluation approach

  • 15% architecture and tool design.
  • 25% implementation labs.
  • 20% evaluation harness.
  • 20% security and governance review.
  • 20% final agent demo and documentation.
  • Python/TypeScript, LLM APIs or local models, LangGraph/CrewAI/AutoGen or custom lightweight framework, vector store, FastAPI/Next.js/Streamlit optional, GitHub, Docker optional.
  • Optional: browser automation sandbox, queue system, logging/tracing tool.

Safety, ethics, and governance emphasis

  • Agents with real-world side effects need explicit user confirmation, logging, and rollback plans.
  • Use least-privilege tool access and isolate credentials.
  • Test for prompt injection, tool misuse, infinite loops, data exfiltration, and unsafe escalation.

Delivery notes

  • Teach deterministic workflows first, then controlled autonomy.
  • Require demos to show failure handling, not just success paths.
  • Appendix A: Portfolio artifacts by course.
  • Appendix B: Universal AI-use disclosure template.
  • Learners should submit this short disclosure with major assignments, projects, or professional outputs:.
  • What AI tool(s) did I use?
  • What did I ask the tool to do?
  • Which parts of the final output were produced or significantly shaped by AI?
  • What did I verify independently?
  • What decisions did I make myself?
  • What limitations or risks remain?
  • Appendix C: Standard verification checklist.
  • Check factual claims against primary or trusted sources.
  • Check numbers, dates, names, citations, regulations, prices, and medical/legal/financial statements manually.
  • Ask the model for uncertainty, assumptions, and possible counterarguments.
  • Use source hierarchy: primary sources first, then reputable secondary sources, then general commentary.
  • For professional outputs, use a qualified human reviewer before client, patient, employee, or public use.
  • Document the review process for high-stakes work.
  • Appendix D: Delivery formats.
  • Appendix E: Trainer requirements.
  • Appendix F: Reference guidance informing the curriculum.
  • The curriculum structure is aligned with current public guidance and trends around AI literacy, effective educational use, risk management, and domain governance. Recommended references for curriculum owners:.
  • UNESCO work on AI in education and AI competency frameworks for teachers and students.
  • OECD Digital Education Outlook 2026 on effective uses of generative AI in education.
  • Stanford HAI AI Index reports for tracking the broader AI landscape, adoption, technical progress, and education trends.
  • NIST AI Risk Management Framework and Generative AI Profile for governance and risk language.
  • WHO guidance on ethics and governance of large multimodal models in health for healthcare-specific safety framing.

Instructor Build Checklist

  • Prepare one short demo for each module and one learner activity that creates a saved artifact.
  • Prepare examples that match the audience, local context, and likely tools learners can access.
  • Add a verification step to every AI-generated output: factual check, source check, data sensitivity check, and quality review.
  • Keep a running portfolio folder so each module contributes to the final project or learner playbook.
  • Reserve time for reflection on what the learner did, what AI did, what was checked, and what remains uncertain.