9. AI Hardware: How AI Works on Chips, Servers, and Data Centers

Course Positioning

A technical systems course that explains the physical and computational stack behind AI: chips, memory, networking, compilers, servers, data centers, and inference economics.

Learning outcomes

Explain why AI workloads are dominated by matrix multiplication, memory bandwidth, parallelism, and data movement.
Compare CPUs, GPUs, TPUs, NPUs, ASICs, FPGAs, edge accelerators, and data center clusters.
Understand training vs inference hardware requirements, precision formats, batching, caching, and latency constraints.
Analyze how memory, networking, storage, cooling, and power delivery shape AI system performance and cost.
Estimate the hardware and cloud cost implications of model size, context length, throughput, and service-level targets.
Understand the AI hardware value chain from chip design to deployment.

Course Design Snapshot

Positioning: A technical systems course that explains the physical and computational stack behind AI: chips, memory, networking, compilers, servers, data centers, and inference economics.
Audience: Engineers, founders, investors, students, IT leaders, procurement teams, and technically curious professionals.
Duration: 8-10 weeks, with optional hardware lab extensions.
Prerequisites: Basic computer architecture helpful but not required. Some math and Python familiarity recommended.
Format: Concept lectures, diagrams, hardware teardown videos, profiling labs, cost modeling, and architecture comparisons.

Expanded Topic-by-Topic Coverage

Module 1. From transistor to tensor

Module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators. Primary live activity or lab: Manually compute a tiny matrix multiply and estimate operation count.

Topics and coverage

bits

What it means: define bits clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

floating point

What it means: define floating point clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

matrix multiplication

What it means: define matrix multiplication clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

parallelism

What it means: define parallelism clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

why AI loves accelerators

What it means: define why AI loves accelerators clearly and connect it to the module focus: From transistor to tensor: bits, floating point, matrix multiplication, parallelism, and why AI loves accelerators.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Manually compute a tiny matrix multiply and estimate operation count.
Learners produce: Manually compute a tiny matrix multiply and estimate operation count.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 2. CPU vs GPU

Module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning. Primary live activity or lab: Profile a CPU vs GPU matrix multiplication if hardware is available.

Topics and coverage

cores

What it means: define cores clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

SIMD/SIMT

What it means: define SIMD/SIMT clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

memory bandwidth

What it means: define memory bandwidth clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

kernels

What it means: define kernels clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

CUDA intuition

What it means: define CUDA intuition clearly and connect it to the module focus: CPU vs GPU: cores, SIMD/SIMT, memory bandwidth, kernels, CUDA intuition, and why GPUs won deep learning.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

why GPUs won deep learning

What it means: place why GPUs won deep learning inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

Practice and evidence of learning

Learners complete or discuss: Profile a CPU vs GPU matrix multiplication if hardware is available.
Learners produce: Profile a CPU vs GPU matrix multiplication if hardware is available.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 3. TPUs, NPUs, ASICs, and edge accelerators

Module focus: TPUs, NPUs, ASICs, and edge accelerators: systolic arrays, specialization, energy efficiency, and deployment constraints. Primary live activity or lab: Compare accelerator types for mobile, cloud training, and real-time inference.

Topics and coverage

systolic arrays

What it means: define systolic arrays clearly and connect it to the module focus: TPUs, NPUs, ASICs, and edge accelerators: systolic arrays, specialization, energy efficiency, and deployment constraints.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

specialization

What it means: define specialization clearly and connect it to the module focus: TPUs, NPUs, ASICs, and edge accelerators: systolic arrays, specialization, energy efficiency, and deployment constraints.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

energy efficiency

What it means: define energy efficiency clearly and connect it to the module focus: TPUs, NPUs, ASICs, and edge accelerators: systolic arrays, specialization, energy efficiency, and deployment constraints.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

deployment constraints

What it means: place deployment constraints inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

Practice and evidence of learning

Learners complete or discuss: Compare accelerator types for mobile, cloud training, and real-time inference.
Learners produce: Compare accelerator types for mobile, cloud training, and real-time inference.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 4. Memory hierarchy

Module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall. Primary live activity or lab: Estimate KV cache memory for different model sizes and context lengths.

Topics and coverage

registers

What it means: define registers clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

SRAM

What it means: define SRAM clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

HBM

What it means: define HBM clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

DRAM

What it means: define DRAM clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

storage

What it means: explain how storage changes the interaction between human intent, model behavior, external information, and final output.
What to cover: inputs, constraints, examples, output format, grounding, iteration, failure modes, and when a human must intervene.
Demonstration: show a weak attempt, a stronger structured attempt, and a reviewed final version with explicit checks.
Evidence of learning: learners create a reusable prompt, schema, retrieval note, or workflow pattern and test it on at least two examples.

KV cache

What it means: define KV cache clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

context length

What it means: define context length clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

the memory wall

What it means: define the memory wall clearly and connect it to the module focus: Memory hierarchy: registers, SRAM, HBM, DRAM, storage, KV cache, context length, and the memory wall.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Estimate KV cache memory for different model sizes and context lengths.
Learners produce: Estimate KV cache memory for different model sizes and context lengths.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 5. Training stack

Module focus: Training stack: data pipelines, distributed training, tensor/data/pipeline parallelism, checkpointing, and interconnect. Primary live activity or lab: Diagram a distributed training system and identify bottlenecks.

Topics and coverage

data pipelines

What it means: connect data pipelines to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.

distributed training

What it means: place distributed training inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

tensor/data/pipeline parallelism

What it means: connect tensor/data/pipeline parallelism to the data lifecycle from source and structure through analysis, interpretation, and decision-making.
What to cover: source reliability, missing or biased data, leakage, assumptions, calculations, and the difference between correlation and decision-ready evidence.
Demonstration: walk through a small dataset or example table and mark the checks required before trusting the result.
Evidence of learning: learners produce a short analysis note that includes assumptions, limitations, and verification steps.

checkpointing

What it means: define checkpointing clearly and connect it to the module focus: Training stack: data pipelines, distributed training, tensor/data/pipeline parallelism, checkpointing, and interconnect.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

interconnect

What it means: define interconnect clearly and connect it to the module focus: Training stack: data pipelines, distributed training, tensor/data/pipeline parallelism, checkpointing, and interconnect.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Diagram a distributed training system and identify bottlenecks.
Learners produce: Diagram a distributed training system and identify bottlenecks.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 6. Inference stack

Module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving. Primary live activity or lab: Build an inference cost and latency worksheet for an LLM API service.

Topics and coverage

batching

What it means: define batching clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

quantization

What it means: define quantization clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

speculative decoding

What it means: define speculative decoding clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

model routing

What it means: place model routing inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

caching

What it means: define caching clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

streaming

What it means: define streaming clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

serving

What it means: define serving clearly and connect it to the module focus: Inference stack: batching, quantization, speculative decoding, model routing, caching, streaming, and serving.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Build an inference cost and latency worksheet for an LLM API service.
Learners produce: Build an inference cost and latency worksheet for an LLM API service.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 7. Networking and data centers

Module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations. Primary live activity or lab: Design a simplified AI cluster architecture and power budget.

Topics and coverage

NVLink

What it means: define NVLink clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

InfiniBand

What it means: define InfiniBand clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Ethernet

What it means: define Ethernet clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

rack design

What it means: show where rack design appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.

power

What it means: define power clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

cooling

What it means: define cooling clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

uptime

What it means: define uptime clearly and connect it to the module focus: Networking and data centers: NVLink, InfiniBand, Ethernet, rack design, power, cooling, uptime, and operations.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

operations

What it means: show where operations appears in the learner's real workflow and which parts are judgment-heavy versus draftable.
What to cover: current workflow, pain points, AI-assisted steps, human review checkpoints, quality standard, and ownership of the final decision.
Demonstration: convert one messy real-world input into a structured brief, draft, analysis, checklist, or next action.
Evidence of learning: learners produce a reusable template or playbook entry that can be used after the course.

Practice and evidence of learning

Learners complete or discuss: Design a simplified AI cluster architecture and power budget.
Learners produce: Design a simplified AI cluster architecture and power budget.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 8. Compilers and software

Module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes. Primary live activity or lab: Trace how a model operation becomes hardware instructions at a conceptual level.

Topics and coverage

CUDA

What it means: define CUDA clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

XLA

What it means: define XLA clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Triton

What it means: define Triton clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

graph optimization

What it means: place graph optimization inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

kernels

What it means: define kernels clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

quantization libraries

What it means: define quantization libraries clearly and connect it to the module focus: Compilers and software: CUDA, XLA, Triton, graph optimization, kernels, quantization libraries, and deployment runtimes.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

deployment runtimes

What it means: place deployment runtimes inside the AI system stack so learners know what problem it solves and what tradeoffs it introduces.
What to cover: inputs, outputs, system boundaries, evaluation criteria, cost or latency implications, and common failure cases.
Demonstration: use a diagram, small code sample, worksheet, or tool trace to make the mechanism visible.
Evidence of learning: learners compare two approaches and explain which one they would choose for a realistic constraint.

Practice and evidence of learning

Learners complete or discuss: Trace how a model operation becomes hardware instructions at a conceptual level.
Learners produce: Trace how a model operation becomes hardware instructions at a conceptual level.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 9. Hardware economics and geopolitics

Module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing. Primary live activity or lab: Map the AI hardware value chain and identify strategic choke points.

Topics and coverage

capex

What it means: define capex clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

depreciation

What it means: define depreciation clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

supply chain

What it means: define supply chain clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

foundries

What it means: define foundries clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

packaging

What it means: define packaging clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

export controls

What it means: define export controls clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

cloud pricing

What it means: define cloud pricing clearly and connect it to the module focus: Hardware economics and geopolitics: capex, depreciation, supply chain, foundries, packaging, export controls, and cloud pricing.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Map the AI hardware value chain and identify strategic choke points.
Learners produce: Map the AI hardware value chain and identify strategic choke points.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Module 10. Future hardware

Module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI. Primary live activity or lab: Prepare a hardware roadmap thesis for one application domain.

Topics and coverage

photonics

What it means: define photonics clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

neuromorphic ideas

What it means: define neuromorphic ideas clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

wafer-scale systems

What it means: define wafer-scale systems clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

memory-centric compute

What it means: define memory-centric compute clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

edge AI

What it means: define edge AI clearly and connect it to the module focus: Future hardware: photonics, neuromorphic ideas, wafer-scale systems, memory-centric compute, and edge AI.
What to cover: the core concept, why it matters, what good usage looks like, and where learners are likely to misunderstand it.
Demonstration: give one simple example, one realistic example, and one failure or limitation example.
Evidence of learning: learners explain the topic in their own words and apply it to a small artifact or decision.

Practice and evidence of learning

Learners complete or discuss: Prepare a hardware roadmap thesis for one application domain.
Learners produce: Prepare a hardware roadmap thesis for one application domain.
Instructor checks for accuracy, practical usefulness, clear assumptions, appropriate human review, and fit with the course audience.
Learners revise once after feedback so the module contributes to the final project, portfolio, or playbook.

Minimum coverage before moving on

Learners can explain the module vocabulary without relying on tool-generated text.
Learners have seen one worked example, one hands-on application, and one limitation or failure case.
Learners know what must be verified, what data must be protected, and who remains accountable for the output.

Core labs and builds

Matrix multiplication and FLOP estimation lab.
Quantization lab: compare size, speed, and quality tradeoffs.
Inference economics lab: cost per 1,000 requests under different models and latency targets.
Hardware value-chain lab: chip designer, foundry, packaging, memory, networking, data center, cloud, application.

Capstone

Design an AI hardware deployment plan for one workload such as chatbot inference, document AI, medical imaging, video generation, classroom AI lab, call-center agents, or edge camera inspection. The plan includes workload profile, hardware choice, cost model, bottlenecks, and scaling strategy.

Assessment design

Hardware comparison memo.
Memory and inference cost calculations.
Cluster architecture diagram.
Final deployment plan.

Recommended tools and datasets

Python notebooks, GPU profiler examples, cloud calculators, model parameter calculators, hardware spec sheets, compiler diagrams, data-center architecture diagrams.

Instructor notes

This course is especially valuable for business and investing audiences because it reveals why AI economics depend on bottlenecks outside the model itself: memory, interconnect, energy, utilization, and supply chain.

Instructor Build Checklist

Prepare one short demo for each module and one learner activity that creates a saved artifact.
Prepare examples that match the audience, local context, and likely tools learners can access.
Add a verification step to every AI-generated output: factual check, source check, data sensitivity check, and quality review.
Keep a running portfolio folder so each module contributes to the final project or learner playbook.
Reserve time for reflection on what the learner did, what AI did, what was checked, and what remains uncertain.