Transformers to Intelligence

How to Design Artificial Intelligence

41 modules

176 lessons

—

Part 1

Course Setup and the Incremental Ladder

Course Setup and the Incremental LadderSign in
Why "Transformers to Intelligence"Sign in
How to Use This CourseSign in
The Incremental Ladder (Step 0 to Step 7)Sign in
The Course LensesSign in
Diagram Legend and Notation TypesSign in

Part 2

Mental Models: Functions, Learning, and Intelligence

Mental Models: Functions, Learning, and IntelligenceSign in
Learning as Function ApproximationSign in
Intelligence as Compression, Prediction, and ControlSign in
Local Optimization, Global BehaviorSign in

Part 3

Mathematical Foundations

Mathematical FoundationsSign in
Linear Algebra for AI: vectors, matrices, tensors, and why geometry is the hidden structure of modelsSign in
Probability and Uncertainty: distributions, expectations, Bayes rule, and calibration as an operational concernSign in
Optimization in Practice: gradients, convex vs non-convex intuition, SGD and common variantsSign in

Part 4

Physics-Inspired Views of Learning

Physics-Inspired Views of LearningSign in
Information Theory as a Lens: entropy, mutual information, and compression-driven interpretations of learningSign in
Energy Landscapes and Statistical Mechanics Intuition: why minima, basins, and temperature metaphors explain training behaviorSign in
Scaling Limits: compute, energy, bandwidth, and the real costs that constrain "bigger models."Sign in

Part 5

Diagramming AI Systems

Diagramming AI SystemsSign in
Computation Graphs and Layers: forward/backward passes as the "wiring diagram" of learning.Sign in
Training vs Inference Paths: data pipelines, feedback loops, and the boundary where behavior is measured.Sign in
System Topologies for AI: model, retrieval, tools, data stores, and users as a coupled system.Sign in

Part 6

Step 0 Representations: Vectors, Embeddings, Tokens

Step 0 Representations: Vectors, Embeddings, TokensSign in
Feature Spaces and Embeddings: one-hot vs dense representations and what "meaning in geometry" implies.Sign in
Tokenization Across Modalities: text, images, and structured inputs as discrete interfaces to continuous models.Sign in
Normalization and Scaling: preprocessing as a stability tool, not a cosmetic step.Sign in

Part 7

Step 0 Models: Linear and Logistic Regression

Step 0 Models: Linear and Logistic RegressionSign in
Linear Regression and MSE: fitting as projection and what linearity limits you to.Sign in
Logistic Regression and Softmax: decision boundaries, cross-entropy, and multi-class classification.Sign in
Training Loops and Optimization Basics: learning rates, SGD, and the earliest forms of regularization.Sign in

Part 8

Step 0 Delivery: Packaging, Serving, and Evaluating Linear Models

Step 0 Delivery: Packaging, Serving, and Evaluating Linear ModelsSign in
From Notebook to API: packaging artifacts, input contracts, and reproducible inference.Sign in
Offline vs Online Evaluation: calibration, drift, and stability as production realities.Sign in
When Linear is Enough: diagnosing "ceiling effects" and knowing when representation learning is required.Sign in

Part 9

Step 1 MLPs and Depth

Step 1 MLPs and DepthSign in
Why Depth Matters: universal approximation versus practical learnability.Sign in
Activations and Initialization: vanishing/exploding gradients as architectural constraints.Sign in
Designing MLPs for Structured Data: inductive bias for tabular and mixed-feature inputs.Sign in

Part 10

Step 1 Convolutions and Local Structure

Step 1 Convolutions and Local StructureSign in
Convolution as an Inductive Bias: receptive fields, weight sharing, and locality.Sign in
Pooling, Stride, and Invariance: what invariance buys and what it breaks.Sign in
Beyond Images: extending convolutional structure to audio, time series, and other domains.Sign in

Part 11

Step 1 Representation Learning and Regularization

Step 1 Representation Learning and RegularizationSign in
Autoencoders and Bottlenecks: learning feature spaces by compressionSign in
Regularization Toolbox: dropout, batch normalization, weight decay, and how they change training dynamicsSign in
Transfer Learning: pretrain then adapt, and why data boundary choices matterSign in

Part 12

Step 2 Recurrent Networks and Temporal Dependencies

Step 2 Recurrent Networks and Temporal DependenciesSign in
RNNs, LSTMs, GRUs: representing state over time and what "memory" means in a modelSign in
Training instability in time: truncated BPTT, gradient pathologies, and practical mitigationSign in
Seq2Seq abstractions: encoder-decoder thinking and where attention first enters as a remedySign in

Part 13

Step 2 Attention Mechanisms

Step 2 Attention MechanismsSign in
Query-Key-Value: attention as differentiable selection and routingSign in
Soft vs hard attention: trade-offs in differentiability, interpretability, and optimizationSign in
Longer contexts: scaling attention and the costs that push you toward transformer-style designsSign in

Part 14

Step 2 Deployment: Architecting and Serving Sequence Models

Step 2 Deployment: Architecting and Serving Sequence ModelsSign in
Streaming vs Offline Inference: boundary choices for latency and correctnessSign in
Memory footprint and sequence length: performance ceilings and practical constraintsSign in
Evaluating sequence tasks: perplexity, BLEU/ROUGE-style metrics, and domain-specific success definitionsSign in

Part 15

Anatomy of a Transformer

Anatomy of a TransformerSign in
Multi-Head Self-Attention: context mixing as the core computational primitiveSign in
Positional encoding: absolute vs relative schemes and what they enable or preventSign in
Feed-forward, residuals, normalization: stability mechanisms and depth scaling behaviorSign in

Part 16

Transformer Variants

Transformer VariantsSign in
Encoder-only, decoder-only, encoder-decoder: matching architecture to task constraintsSign in
Long-context strategies: sparse and linearized attention and the costs they shift elsewhereSign in
Mixture-of-experts and scaling variants: capacity, routing, and operational complexitySign in

Part 17

Training Transformers

Training TransformersSign in
Data and tokenization pipelines: corpus construction as the dominant design leverSign in
Distributed training patterns: parallelism as a systems boundary problemSign in
Stability and schedules: making training predictable under scaleSign in

Part 18

Inference, Optimization, and Compression

Inference, Optimization, and CompressionSign in
Inference graphs and KV caching: throughput/latency trade-offs and streaming implicationsSign in
Quantization, pruning, distillation: compressing capability into deployable footprintsSign in
Hardware-aware deployment: CPUs, GPUs, accelerators, and where bottlenecks migrateSign in

Part 19

What Is a Foundation Model?

What Is a Foundation Model?Sign in
Self-Supervision and Emergence: why next-token and related objectives produce broad capabilities.Sign in
Objective Families: masked, causal, and hybrid objectives as behavior-shaping choices.Sign in
Scaling Laws Intuition: the data/compute/parameters triangle and what it predicts well (and poorly).Sign in

Part 20

LLMs and SLMs as Design Points

LLMs and SLMs as Design PointsSign in
LLM Design Posture: generality, capability, and the costs you accept to get them.Sign in
SLM Design Posture: specialization, edge constraints, and operational privacy/cost advantages.Sign in
Choosing a Point in the Trade Space: capability, latency, privacy, and unit economics.Sign in

Part 21

Fine-Tuning, Adaptation, and Instruction Following

Fine-Tuning, Adaptation, and Instruction FollowingSign in
Supervised Fine-Tuning: task shaping and the risks of brittle specialization.Sign in
Adapters and LoRA-Style Methods: low-rank adaptation as an operationally tractable compromise.Sign in
Instruction Tuning and Preference Data: steering behavior via curated interaction distributions.Sign in

Part 22

Alignment, Safety, and Guardrails

Alignment, Safety, and GuardrailsSign in
RLHF and Related Approaches: what is optimized, what is approximated, and where reward hacking appears.Sign in
Red-Teaming and Safety Policies: turning anticipated misuse into testable evaluation artifacts.Sign in
Guardrails and Constraints: balancing helpfulness, honesty, and harmlessness as a system design problem.Sign in

Part 23

Evaluation and Benchmarking of Foundation Models

Evaluation and Benchmarking of Foundation ModelsSign in
Benchmarks vs Holistic Evaluation: what benchmarks measure, and what they systematically miss.Sign in
Robustness, Calibration, Bias, Fairness: evaluation as continuous monitoring of failure surfaces.Sign in
Human Evaluation Loops: feedback pipelines, labeler variance, and governance of subjective judgments.Sign in

Part 24

Multi-Modal Architectures

Multi-Modal ArchitecturesSign in
Modality Encoders: vision, audio, code, and structured data as distinct input contracts.Sign in
Fusion Strategies: cross-attention, late fusion, and joint embedding spaces.Sign in
Contrastive and Joint Training: aligning modalities and the failure modes of misalignment.Sign in

Part 25

Retrieval-Augmented Generation (RAG)

Retrieval-Augmented Generation (RAG)Sign in
Vector Stores and Indexes: retrieval as an external memory boundary.Sign in
Chunking and Context Construction: grounding as an engineering problem, not a slogan.Sign in
RAG Failure Modes: hallucination, stale knowledge, retrieval miss, and robustness strategies.Sign in

Part 26

Tool-Using Models

Tool-Using ModelsSign in
Function Calling and Structured Outputs: schemas as the interface contract between models and tools.Sign in
Designing Tool APIs for Models: affordances, constraints, and making actions safer than free-form text.Sign in
Access Control and Safety Around Tools: permissions, auditing, and failure containment.Sign in

Part 27

Model Ecosystems and Composition

Model Ecosystems and CompositionSign in
Routing and Composition: orchestrating specialized models as a system-level mixture of experts.Sign in
Specialized Helpers: code, math, extraction, planning, and when decomposition improves reliability.Sign in
Latency, Reliability, and Cost: managing budgets and failure propagation in composed systems.Sign in

Part 28

Agents and Feedback Loops

Agents and Feedback LoopsSign in
Agent Loops: observe-think-act-learn as a control system, not just an app pattern.Sign in
Planning and Reflection: when self-critique helps and when it creates new failure modes.Sign in
Memory Architectures: short-term context versus long-term storage and retrieval boundaries.Sign in

Part 29

Multi-Agent Systems

Multi-Agent SystemsSign in
Coordination and Communication: protocols, roles, and division of labor among agents.Sign in
Emergence and Simulation: what can arise from local rules and why it is hard to predict.Sign in
Governance and Control: bounding agent societies with policies, incentives, and oversight.Sign in

Part 30

Orchestration Layers and Workflows

Orchestration Layers and WorkflowsSign in
Orchestration Frameworks and Policy Engines: separating execution from governance.Sign in
Tool Routing, Retries, Backoff: reliability engineering for non-deterministic components.Sign in
Integrating with Platforms and UIs: connecting agents to data, business systems, and human workflows.Sign in

Part 31

Operating AI Systems in Production

Operating AI Systems in ProductionSign in
Monitoring the Right Things: quality, safety, latency, and cost as co-equal SLOs.Sign in
Incident Response for AI: rollbacks, kill switches, override mechanisms, and postmortems.Sign in
Continuous Improvement: A/B testing, canaries, and closing the loop without contaminating evaluation.Sign in

Part 32

Organizational Design for AI Systems

Organizational Design for AI SystemsSign in
Roles and Interfaces: research, engineering, safety, product, and policy as coupled responsibilities.Sign in
Data Lifecycle Governance: collection, labeling, privacy, and retention as design constraints.Sign in
Documentation and Accountability: audits, model cards, and external reporting as operational necessities.Sign in

Part 33

World Models and Internal Simulators

World Models and Internal SimulatorsSign in
World Models in RL: latent dynamics and why prediction becomes simulation.Sign in
Learning Environment Structure: representations of causality, dynamics, and uncertainty.Sign in
Imagination and Counterfactuals: using simulators for planning and robustness.Sign in

Part 34

Planning and Control with Learned Models

Planning and Control with Learned ModelsSign in
Model Predictive Control: planning under constraints with learned dynamics.Sign in
Integrating LMs with World Models: division of labor between language and dynamics.Sign in
Closed-Loop Systems: perception -> model -> plan -> act as an end-to-end safety boundary.Sign in

Part 35

Continual Learning, Memory, and Identity

Continual Learning, Memory, and IdentitySign in
Lifelong Learning and Forgetting: catastrophic forgetting as a systems reliability problem.Sign in
Memory Structures: internal vs external memory and how they change failure surfaces.Sign in
Identity and Versioning Over Time: stability, updates, and continuity as product promises.Sign in

Part 36

AI in the Real World: Humans, Organizations, Society

AI in the Real World: Humans, Organizations, SocietySign in
Human-AI Interaction Patterns: interface design as behavior shaping and risk control.Sign in
Societal Embedding: labor, creativity, decision-making, and organizational adoption dynamics.Sign in
Governance and Long-Term Risk: regulation, accountability frameworks, and durable operational norms.Sign in

Part 37

Architectural Patterns for AI Models

Architectural Patterns for AI ModelsSign in
Encoder/Decoder Families: autoencoders and representation bottlenecks as reusable motifsSign in
Diffusion and Energy-Based Perspectives: generative patterns and where they fit operationallySign in
Hybrid Architectures: combining symbolic and neural components and the boundary management requiredSign in
Architecture Selection by Step: matching pattern choice to ladder rung constraintsSign in
Failure Surfaces by Architecture: typical breakdown modes and what to test firstSign in

Part 38

Data, Datasets, and Evaluation Patterns

Data, Datasets, and Evaluation PatternsSign in
Data Curation: filtering, deduplication, and provenance as model-shaping forcesSign in
Privacy and Data Risk: sensitive data handling and the costs of leakageSign in
Synthetic Data and Augmentation: when synthetic helps, and how it can misleadSign in
Evaluation Suites and Dashboards: continuous measurement as part of the systemSign in
Dataset Shift and Drift: maintaining meaning as environments and users changeSign in

Part 39

Infrastructure and Scaling Patterns

Infrastructure and Scaling PatternsSign in
Training Clusters: compute topology, throughput constraints, and distributed training patternsSign in
Inference Infrastructure: online, batch, streaming, and the operational consequences of eachSign in
Cost Optimization and Elasticity: scaling down as a first-class design goalSign in
Hardware-Aware Design: choosing architectures and serving strategies that match acceleratorsSign in
Reliability Under Load: degradation strategies when latency, cost, and quality collideSign in

Part 40

Safety, Security, and Robustness Patterns

Safety, Security, and Robustness PatternsSign in
Defense-in-Depth for AI: layered mitigations across model, data, and system boundariesSign in
Adversarial Robustness and Abuse Resistance: anticipating malicious inputs and misuse incentivesSign in
Policy, Logging, and Auditability: making failures diagnosable and decisions reviewableSign in
Red-Teaming as an Engineering Practice: turning threats into test plans and regression suitesSign in
Post-Hoc Analysis and Remediation: learning loops that improve safety without breaking trustSign in

Part 41

Design Patterns for AI Products

Design Patterns for AI ProductsSign in
Zero-Shot and Few-Shot UX: prompt-based interaction patterns and their brittlenessSign in
Copilot, Agent, Assistant Archetypes: choosing product posture and responsibility boundariesSign in
Human-in-the-Loop Workflows: escalation paths, approvals, and when automation should stopSign in
Trust and Transparency Patterns: explaining uncertainty, citing sources, and managing expectationsSign in
Product Safety in Practice: user reporting, abuse handling, and operational governanceSign in

Course overview