AI AI EducationCurriculum Library
All courses

AI Curriculum

17. AI Inference Course: How Modern AI Systems Think, Retrieve, Call Tools, and Serve Users

Audienceengineers, technical PMs, researchers, advanced students, founders
Duration24-40 hours
Modules10

17. AI Inference Course: How Modern AI Systems Think, Retrieve, Call Tools, and Serve Users

Course Positioning

This is a technical course on what happens after a model exists: prompting, tokenization, decoding, context management, retrieval, tool use, structured outputs, evaluation, serving, latency, cost, safety, and inference-time optimization. It is ideal for people building AI products without training frontier models.

Learning outcomes

  • Explain the inference-time behavior of transformer language models: tokens, logits, sampling, context windows, and decoding.
  • Build reliable LLM applications using prompt engineering, RAG, function calling, structured outputs, and guardrails.
  • Evaluate model outputs with task-specific metrics, golden datasets, human review, and automated checks.
  • Optimize for latency, cost, reliability, safety, and user experience.
  • Prototype an inference-time system that improves model usefulness without retraining.

Expanded Topic-by-Topic Coverage

Module 1. The inference stack

Module focus: Model, tokenizer, prompt, context, decoding, output parser, tools, retrieval, application layer. Primary live activity or lab: Trace a user request through an LLM app architecture. Expected take-home output: Inference system diagram.

Topics and coverage

Model

  • What it means: place Model inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
  • What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
  • Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
  • Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

tokenizer

  • What it means: define tokenizer clearly and connect it to the module focus: Model, tokenizer, prompt, context, decoding, output parser, tools, retrieval, application layer.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

prompt

  • What it means: explain how prompt changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

context

  • What it means: define context clearly and connect it to the module focus: Model, tokenizer, prompt, context, decoding, output parser, tools, retrieval, application layer.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

decoding

  • What it means: define decoding clearly and connect it to the module focus: Model, tokenizer, prompt, context, decoding, output parser, tools, retrieval, application layer.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

output parser

  • What it means: define output parser clearly and connect it to the module focus: Model, tokenizer, prompt, context, decoding, output parser, tools, retrieval, application layer.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

tools

  • What it means: define tools clearly and connect it to the module focus: Model, tokenizer, prompt, context, decoding, output parser, tools, retrieval, application layer.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

retrieval

  • What it means: explain how retrieval changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

application layer

  • What it means: define application layer clearly and connect it to the module focus: Model, tokenizer, prompt, context, decoding, output parser, tools, retrieval, application layer.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Trace a user request through an LLM app architecture.
  • Learners produce: Inference system diagram.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 2. Tokenization and context

Module focus: Tokens, context windows, truncation, prompt packing, long-context failure modes. Primary live activity or lab: Inspect tokenization and design context budget. Expected take-home output: Context budget plan.

Topics and coverage

Tokens

  • What it means: define Tokens clearly and connect it to the module focus: Tokens, context windows, truncation, prompt packing, long-context failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

context windows

  • What it means: define context windows clearly and connect it to the module focus: Tokens, context windows, truncation, prompt packing, long-context failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

truncation

  • What it means: define truncation clearly and connect it to the module focus: Tokens, context windows, truncation, prompt packing, long-context failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

prompt packing

  • What it means: explain how prompt packing changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

long-context failure modes

  • What it means: define long-context failure modes clearly and connect it to the module focus: Tokens, context windows, truncation, prompt packing, long-context failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Inspect tokenization and design context budget.
  • Learners produce: Context budget plan.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 3. Decoding and sampling

Module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity. Primary live activity or lab: Compare outputs across decoding settings. Expected take-home output: Decoding experiment log.

Topics and coverage

Greedy decoding

  • What it means: define Greedy decoding clearly and connect it to the module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

temperature

  • What it means: define temperature clearly and connect it to the module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

top-p

  • What it means: define top-p clearly and connect it to the module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

top-k

  • What it means: define top-k clearly and connect it to the module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

beams

  • What it means: define beams clearly and connect it to the module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

repetition

  • What it means: define repetition clearly and connect it to the module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

determinism

  • What it means: define determinism clearly and connect it to the module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

diversity

  • What it means: define diversity clearly and connect it to the module focus: Greedy decoding, temperature, top-p, top-k, beams, repetition, determinism, diversity.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Compare outputs across decoding settings.
  • Learners produce: Decoding experiment log.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 4. Prompt programs and structured outputs

Module focus: System/developer/user prompts, schemas, JSON, constrained decoding, output validation. Primary live activity or lab: Build a structured extraction prompt with validation. Expected take-home output: Schema-based extractor.

Topics and coverage

System/developer/user prompts

  • What it means: explain how System/developer/user prompts changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

schemas

  • What it means: define schemas clearly and connect it to the module focus: System/developer/user prompts, schemas, JSON, constrained decoding, output validation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

JSON

  • What it means: define JSON clearly and connect it to the module focus: System/developer/user prompts, schemas, JSON, constrained decoding, output validation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

constrained decoding

  • What it means: define constrained decoding clearly and connect it to the module focus: System/developer/user prompts, schemas, JSON, constrained decoding, output validation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

output validation

  • What it means: define output validation clearly and connect it to the module focus: System/developer/user prompts, schemas, JSON, constrained decoding, output validation.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Build a structured extraction prompt with validation.
  • Learners produce: Schema-based extractor.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 5. Retrieval augmented generation

Module focus: Embeddings, chunking, vector search, hybrid search, reranking, citations, grounding, failure modes. Primary live activity or lab: Build a mini RAG pipeline over sample documents. Expected take-home output: RAG prototype.

Topics and coverage

Embeddings

  • What it means: explain how Embeddings changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

chunking

  • What it means: define chunking clearly and connect it to the module focus: Embeddings, chunking, vector search, hybrid search, reranking, citations, grounding, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
  • What it means: define vector search clearly and connect it to the module focus: Embeddings, chunking, vector search, hybrid search, reranking, citations, grounding, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
  • What it means: define hybrid search clearly and connect it to the module focus: Embeddings, chunking, vector search, hybrid search, reranking, citations, grounding, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

reranking

  • What it means: define reranking clearly and connect it to the module focus: Embeddings, chunking, vector search, hybrid search, reranking, citations, grounding, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

citations

  • What it means: define citations clearly and connect it to the module focus: Embeddings, chunking, vector search, hybrid search, reranking, citations, grounding, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

grounding

  • What it means: define grounding clearly and connect it to the module focus: Embeddings, chunking, vector search, hybrid search, reranking, citations, grounding, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

failure modes

  • What it means: define failure modes clearly and connect it to the module focus: Embeddings, chunking, vector search, hybrid search, reranking, citations, grounding, failure modes.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Build a mini RAG pipeline over sample documents.
  • Learners produce: RAG prototype.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 6. Tool use and function calling

Module focus: APIs, calculators, search, databases, tool selection, error handling, permissions. Primary live activity or lab: Create a function-calling workflow for a simple task. Expected take-home output: Tool-using assistant.

Topics and coverage

APIs

  • What it means: define APIs clearly and connect it to the module focus: APIs, calculators, search, databases, tool selection, error handling, permissions.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

calculators

  • What it means: define calculators clearly and connect it to the module focus: APIs, calculators, search, databases, tool selection, error handling, permissions.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
  • What it means: define search clearly and connect it to the module focus: APIs, calculators, search, databases, tool selection, error handling, permissions.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

databases

  • What it means: connect databases to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
  • What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
  • Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
  • Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.

tool selection

  • What it means: define tool selection clearly and connect it to the module focus: APIs, calculators, search, databases, tool selection, error handling, permissions.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

error handling

  • What it means: define error handling clearly and connect it to the module focus: APIs, calculators, search, databases, tool selection, error handling, permissions.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

permissions

  • What it means: define permissions clearly and connect it to the module focus: APIs, calculators, search, databases, tool selection, error handling, permissions.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Create a function-calling workflow for a simple task.
  • Learners produce: Tool-using assistant.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 7. Evaluation and observability

Module focus: Gold sets, unit tests for prompts, LLM-as-judge caveats, regression testing, logging, traces. Primary live activity or lab: Design an evaluation set and scoring rubric. Expected take-home output: Eval harness.

Topics and coverage

Gold sets

  • What it means: define Gold sets clearly and connect it to the module focus: Gold sets, unit tests for prompts, LLM-as-judge caveats, regression testing, logging, traces.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

unit tests for prompts

  • What it means: explain how unit tests for prompts changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

LLM-as-judge caveats

  • What it means: explain how LLM-as-judge caveats changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

regression testing

  • What it means: place regression testing inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
  • What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
  • Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
  • Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

logging

  • What it means: define logging clearly and connect it to the module focus: Gold sets, unit tests for prompts, LLM-as-judge caveats, regression testing, logging, traces.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

traces

  • What it means: define traces clearly and connect it to the module focus: Gold sets, unit tests for prompts, LLM-as-judge caveats, regression testing, logging, traces.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Design an evaluation set and scoring rubric.
  • Learners produce: Eval harness.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 8. Latency, cost, and reliability

Module focus: Caching, batching, streaming, model routing, smaller models, fallbacks, rate limits. Primary live activity or lab: Estimate cost/latency for three architectures. Expected take-home output: Serving plan.

Topics and coverage

Caching

  • What it means: define Caching clearly and connect it to the module focus: Caching, batching, streaming, model routing, smaller models, fallbacks, rate limits.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

batching

  • What it means: define batching clearly and connect it to the module focus: Caching, batching, streaming, model routing, smaller models, fallbacks, rate limits.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

streaming

  • What it means: define streaming clearly and connect it to the module focus: Caching, batching, streaming, model routing, smaller models, fallbacks, rate limits.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

model routing

  • What it means: place model routing inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
  • What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
  • Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
  • Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

smaller models

  • What it means: place smaller models inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
  • What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
  • Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
  • Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

fallbacks

  • What it means: define fallbacks clearly and connect it to the module focus: Caching, batching, streaming, model routing, smaller models, fallbacks, rate limits.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

rate limits

  • What it means: define rate limits clearly and connect it to the module focus: Caching, batching, streaming, model routing, smaller models, fallbacks, rate limits.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Estimate cost/latency for three architectures.
  • Learners produce: Serving plan.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 9. Safety and guardrails

Module focus: Input filtering, output moderation, prompt injection, data leakage, privacy, abuse cases. Primary live activity or lab: Red-team a RAG/tool app. Expected take-home output: Safety test report.

Topics and coverage

Input filtering

  • What it means: define Input filtering clearly and connect it to the module focus: Input filtering, output moderation, prompt injection, data leakage, privacy, abuse cases.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

output moderation

  • What it means: define output moderation clearly and connect it to the module focus: Input filtering, output moderation, prompt injection, data leakage, privacy, abuse cases.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

prompt injection

  • What it means: explain how prompt injection changes the interaction between human intent, model behavior, external information, and final output.
  • What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
  • Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
  • Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

data leakage

  • What it means: connect data leakage to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
  • What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
  • Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
  • Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.

privacy

  • What it means in this course: define privacy in operational terms, not as an abstract principle.
  • What to cover: sensitive data boundaries, affected stakeholders, approval paths, documentation, and what engineers, technical PMs, researchers, advanced students, founders must never delegate blindly to AI.
  • Use case: present one acceptable use, one borderline use, and one prohibited use, then ask learners to justify the classification.
  • Evidence of learning: learners add a risk control, review step, or escalation rule to their course project.

abuse cases

  • What it means: define abuse cases clearly and connect it to the module focus: Input filtering, output moderation, prompt injection, data leakage, privacy, abuse cases.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Red-team a RAG/tool app.
  • Learners produce: Safety test report.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 10. Inference-time optimization project

Module focus: Self-consistency, critique loops, verifier models, search, reranking, memory, planning. Primary live activity or lab: Improve a baseline assistant on a task. Expected take-home output: Final technical prototype.

Topics and coverage

Self-consistency

  • What it means: define Self-consistency clearly and connect it to the module focus: Self-consistency, critique loops, verifier models, search, reranking, memory, planning.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

critique loops

  • What it means: define critique loops clearly and connect it to the module focus: Self-consistency, critique loops, verifier models, search, reranking, memory, planning.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

verifier models

  • What it means: place verifier models inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
  • What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
  • Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
  • Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

search

  • What it means: define search clearly and connect it to the module focus: Self-consistency, critique loops, verifier models, search, reranking, memory, planning.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

reranking

  • What it means: define reranking clearly and connect it to the module focus: Self-consistency, critique loops, verifier models, search, reranking, memory, planning.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

memory

  • What it means: define memory clearly and connect it to the module focus: Self-consistency, critique loops, verifier models, search, reranking, memory, planning.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

planning

  • What it means: define planning clearly and connect it to the module focus: Self-consistency, critique loops, verifier models, search, reranking, memory, planning.
  • What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
  • Demonstration: give one simple example, one realistic example, and one failure or limitation example.
  • Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

  • Learners complete or discuss: Improve a baseline assistant on a task.
  • Learners produce: Final technical prototype.
  • Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
  • Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

  • Learners can explain the module vocabulary without relying on tool-generated text.
  • Learners have seen one worked example, one hands-on application, and one limitation or failure case.
  • Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Labs, projects, and assessments

  • Lab 1: Token/context budgeting and decoding experiments.
  • Lab 2: Structured output extractor with schema validation and failure handling.
  • Lab 3: Mini RAG system with evaluation set and citation checks.
  • Lab 4: Tool-calling assistant with logs and retries.
  • Capstone: Inference-time application with eval harness, cost/latency notes, safety tests, and deployment plan.

Evaluation approach

  • 15% conceptual quizzes.
  • 20% structured output and decoding labs.
  • 25% RAG/tool-use implementation.
  • 20% evaluation harness.
  • 20% final prototype and technical report.
  • Python, notebooks, LLM APIs or local models, vector database, embedding model, LangChain/LlamaIndex or lightweight custom harness, FastAPI/Streamlit optional.
  • Optional: OpenTelemetry-style tracing, promptfoo/evals tools, Docker.

Safety, ethics, and governance emphasis

  • Teach prompt injection and data exfiltration as first-class risks.
  • Use synthetic or public datasets for labs.
  • Do not connect tools with real-world side effects until permissions, logging, and human approval are implemented.

Delivery notes

  • This course should be code-heavy and lab-driven.
  • A lighter technical-PM version can remove implementation details and focus on architecture, tradeoffs, and vendor evaluation.

Instructor Build Checklist

  • Prepare one short demo for each module and one learner activity that creates a saved artifact.
  • Prepare examples that match the audience, local context, and likely tools learners can access.
  • Add a verification step to every AI-generated output: factual check, source check, data sensitivity check, and quality review.
  • Keep a running portfolio folder so each module contributes to the final project or learner playbook.
  • Reserve time for reflection on what the learner did, what AI did, what was checked, and what remains uncertain.