9. AI Hardware: How AI Works on Chips, Servers, and Data Centers
Course Positioning
A technical systems course that explains the physical and computational stack behind AI: chips, memory, networking, compilers, servers, data centers, and inference economics.
Learning outcomes
- Explain why AI workloads are dominated by matrix multiplication, memory bandwidth, parallelism, and data movement.
- Compare CPUs, GPUs, TPUs, NPUs, ASICs, FPGAs, edge accelerators, and data center clusters.
- Understand training vs inference hardware requirements, precision formats, batching, caching, and latency constraints.
- Analyze how memory, networking, storage, cooling, and power delivery shape AI system performance and cost.
- Estimate the hardware and cloud cost implications of model size, context length, throughput, and service-level targets.
- Understand the AI hardware value chain from chip design to deployment.
Course Design Snapshot
- Positioning: A technical systems course that explains the physical and computational stack behind AI: chips, memory, networking, compilers, servers, data centers, and inference economics.
- Audience: Engineers, founders, investors, students, IT leaders, procurement teams, and technically curious professionals.
- Duration: 8-10 weeks, with optional hardware lab extensions.
- Prerequisites: Basic computer architecture helpful but not required. Some math and Python familiarity recommended.
- Format: Concept lectures, diagrams, hardware teardown videos, profiling labs, cost modeling, and architecture comparisons.
Expanded Topic-by-Topic Coverage
Module 1. From transistor to tensor
Module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators. Primary live activity or lab: Manually compute a tiny matrix multiply and estimate operation count.
Topics and coverage
bits
- What it means: define bits clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
floating point
- What it means: define floating point clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
matrix multiplication
- What it means: define matrix multiplication clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
parallelism
- What it means: define parallelism clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
why AI loves accelerators
- What it means: define why AI loves accelerators clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Manually compute a tiny matrix multiply and estimate operation count.
- Learners produce: Manually compute a tiny matrix multiply and estimate operation count.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 2. CPU vs GPU
Module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning. Primary live activity or lab: Profile a CPU vs GPU matrix multiplication if hardware is available.
Topics and coverage
cores
- What it means: define cores clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
SIMD/SIMT
- What it means: define SIMD/SIMT clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
memory bandwidth
- What it means: define memory bandwidth clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
kernels
- What it means: define kernels clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
CUDA intuition
- What it means: define CUDA intuition clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
why GPUs won deep learning
- What it means: place why GPUs won deep learning inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
Practice and evidence of learning
- Learners complete or discuss: Profile a CPU vs GPU matrix multiplication if hardware is available.
- Learners produce: Profile a CPU vs GPU matrix multiplication if hardware is available.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 3. TPUs, NPUs, ASICs, and edge accelerators
Module focus: TPUs, NPUs, ASICs, and edge accelerators: systolic arrays, specialization, energy efficiency, and deployment constraints. Primary live activity or lab: Compare accelerator types for mobile, cloud training, and real-time inference.
Topics and coverage
systolic arrays
- What it means: define systolic arrays clearly and connect it to the module focus: TPUs, NPUs, ASICs, and edge accelerators: systolic arrays, specialization, energy efficiency, and deployment constraints.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
specialization
- What it means: define specialization clearly and connect it to the module focus: TPUs, NPUs, ASICs, and edge accelerators: systolic arrays, specialization, energy efficiency, and deployment constraints.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
energy efficiency
- What it means: define energy efficiency clearly and connect it to the module focus: TPUs, NPUs, ASICs, and edge accelerators: systolic arrays, specialization, energy efficiency, and deployment constraints.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
deployment constraints
- What it means: place deployment constraints inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
Practice and evidence of learning
- Learners complete or discuss: Compare accelerator types for mobile, cloud training, and real-time inference.
- Learners produce: Compare accelerator types for mobile, cloud training, and real-time inference.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 4. Memory hierarchy
Module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall. Primary live activity or lab: Estimate KV cache memory for different model sizes and context lengths.
Topics and coverage
registers
- What it means: define registers clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
SRAM
- What it means: define SRAM clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
HBM
- What it means: define HBM clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
DRAM
- What it means: define DRAM clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
storage
- What it means: explain how storage changes the interaction between human intent, model behavior, external information, and final output.
- What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
- Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
- Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.
KV cache
- What it means: define KV cache clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
context length
- What it means: define context length clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
the memory wall
- What it means: define the memory wall clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Estimate KV cache memory for different model sizes and context lengths.
- Learners produce: Estimate KV cache memory for different model sizes and context lengths.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 5. Training stack
Module focus: Training stack: data pipelines, distributed training, tensor/data/pipeline parallelism, checkpointing, and interconnect. Primary live activity or lab: Diagram a distributed training system and identify bottlenecks.
Topics and coverage
data pipelines
- What it means: connect data pipelines to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
distributed training
- What it means: place distributed training inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
tensor/data/pipeline parallelism
- What it means: connect tensor/data/pipeline parallelism to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
- What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
- Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
- Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.
checkpointing
- What it means: define checkpointing clearly and connect it to the module focus: Training stack: data pipelines, distributed training, tensor/data/pipeline parallelism, checkpointing, and interconnect.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
interconnect
- What it means: define interconnect clearly and connect it to the module focus: Training stack: data pipelines, distributed training, tensor/data/pipeline parallelism, checkpointing, and interconnect.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Diagram a distributed training system and identify bottlenecks.
- Learners produce: Diagram a distributed training system and identify bottlenecks.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 6. Inference stack
Module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving. Primary live activity or lab: Build an inference cost and latency worksheet for an LLM API service.
Topics and coverage
batching
- What it means: define batching clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
quantization
- What it means: define quantization clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
speculative decoding
- What it means: define speculative decoding clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
model routing
- What it means: place model routing inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
caching
- What it means: define caching clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
streaming
- What it means: define streaming clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
serving
- What it means: define serving clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Build an inference cost and latency worksheet for an LLM API service.
- Learners produce: Build an inference cost and latency worksheet for an LLM API service.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 7. Networking and data centers
Module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations. Primary live activity or lab: Design a simplified AI cluster architecture and power budget.
Topics and coverage
NVLink
- What it means: define NVLink clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
InfiniBand
- What it means: define InfiniBand clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Ethernet
- What it means: define Ethernet clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
rack design
- What it means: show where rack design appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
- What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
- Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
- Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.
power
- What it means: define power clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
cooling
- What it means: define cooling clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
uptime
- What it means: define uptime clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
operations
- What it means: show where operations appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
- What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
- Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
- Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.
Practice and evidence of learning
- Learners complete or discuss: Design a simplified AI cluster architecture and power budget.
- Learners produce: Design a simplified AI cluster architecture and power budget.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 8. Compilers and software
Module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes. Primary live activity or lab: Trace how a model operation becomes hardware instructions at a conceptual level.
Topics and coverage
CUDA
- What it means: define CUDA clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
XLA
- What it means: define XLA clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Triton
- What it means: define Triton clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
graph optimization
- What it means: place graph optimization inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
kernels
- What it means: define kernels clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
quantization libraries
- What it means: define quantization libraries clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
deployment runtimes
- What it means: place deployment runtimes inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
- What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
- Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
- Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.
Practice and evidence of learning
- Learners complete or discuss: Trace how a model operation becomes hardware instructions at a conceptual level.
- Learners produce: Trace how a model operation becomes hardware instructions at a conceptual level.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 9. Hardware economics and geopolitics
Module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing. Primary live activity or lab: Map the AI hardware value chain and identify strategic choke points.
Topics and coverage
capex
- What it means: define capex clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
depreciation
- What it means: define depreciation clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
supply chain
- What it means: define supply chain clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
foundries
- What it means: define foundries clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
packaging
- What it means: define packaging clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
export controls
- What it means: define export controls clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
cloud pricing
- What it means: define cloud pricing clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Map the AI hardware value chain and identify strategic choke points.
- Learners produce: Map the AI hardware value chain and identify strategic choke points.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Module 10. Future hardware
Module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI. Primary live activity or lab: Prepare a hardware roadmap thesis for one application domain.
Topics and coverage
photonics
- What it means: define photonics clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
neuromorphic ideas
- What it means: define neuromorphic ideas clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
wafer-scale systems
- What it means: define wafer-scale systems clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
memory-centric compute
- What it means: define memory-centric compute clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
edge AI
- What it means: define edge AI clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
- What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
- Demonstration: give one simple example, one realistic example, and one failure or limitation example.
- Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.
Practice and evidence of learning
- Learners complete or discuss: Prepare a hardware roadmap thesis for one application domain.
- Learners produce: Prepare a hardware roadmap thesis for one application domain.
- Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
- Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.
Minimum coverage before moving on
- Learners can explain the module vocabulary without relying on tool-generated text.
- Learners have seen one worked example, one hands-on application, and one limitation or failure case.
- Learners know what must be verified, what data must be protected, and who remains accountable for the output.
Core labs and builds
- Matrix multiplication and FLOP estimation lab.
- Quantization lab: compare size, speed, and quality tradeoffs.
- Inference economics lab: cost per 1,000 requests under different models and latency targets.
- Hardware value-chain lab: chip designer, foundry, packaging, memory, networking, data center, cloud, application.
Capstone
- Design an AI hardware deployment plan for one workload such as chatbot inference, document AI, medical imaging, video generation, classroom AI lab, call-center agents, or edge camera inspection. The plan includes workload profile, hardware choice, cost model, bottlenecks, and scaling strategy.
Assessment design
- Hardware comparison memo.
- Memory and inference cost calculations.
- Cluster architecture diagram.
- Final deployment plan.
Recommended tools and datasets
- Python notebooks, GPU profiler examples, cloud calculators, model parameter calculators, hardware spec sheets, compiler diagrams, data-center architecture diagrams.
Instructor notes
- This course is especially valuable for business and investing audiences because it reveals why AI economics depend on bottlenecks outside the model itself: memory, interconnect, energy, utilization, and supply chain.
Instructor Build Checklist
- Prepare one short demo for each module and one learner activity that creates a saved artifact.
- Prepare examples that match the audience, local context, and likely tools learners can access.
- Add a verification step to every AI-generated output: factual check, source check, data sensitivity check, and quality review.
- Keep a running portfolio folder so each module contributes to the final project or learner playbook.
- Reserve time for reflection on what the learner did, what AI did, what was checked, and what remains uncertain.