Signals To Streams

How to design Real-Time & Streaming Data Systems

52 modules

210 lessons

—

Part 1

Appendices

Appendix A - Diagram Templates by StepSign in
Appendix B - Mapping Concepts to Real-World Streaming StacksSign in
Appendix C - Readiness ChecklistsSign in
Appendix D - GlossarySign in

Part 2

Course Setup and the Incremental Ladder

Course Setup and the Incremental LadderSign in
Why Signals to StreamsSign in
How to Use This CourseSign in
The Incremental Ladder (Step 0 to Step 7)Sign in
The Course LensesSign in
Diagram Legend and Notation TypesSign in

Part 3

Signals, Events, and Streams

Signals, Events, and StreamsSign in
Signals vs EventsSign in
Streams vs BatchSign in
The Core Loop (Producer to Decision)Sign in

Part 4

Time in Data Systems

Time in Data SystemsSign in
Event Time vs Processing TimeSign in
Clocks and SkewSign in
Time Semantics as a ContractSign in

Part 5

Ordering, Lateness, and Out-of-Order Data

Ordering, Lateness, and Out-of-Order DataSign in
Ordering Realities: Per-Partition Ordering vs the Myth of Global OrderSign in
Lateness and Out-of-Order: Late Arrivals, Backfills, and What "Correct" Means in MotionSign in
Watermarks (Conceptually): Deciding "How Late Is Too Late" and Operationalizing That DecisionSign in

Part 6

From Logs and Metrics to Streams

From Logs and Metrics to StreamsSign in
Telemetry as Streams: Logs, Metrics, Traces as Time-Series and Event StreamsSign in
Aggregation as Streaming: Metrics Rollups, Downsampling, and Near-Real-Time ViewsSign in
Streaming Boundaries: Where Observability Pipelines Resemble Product Event Pipelines (and Where They Differ)Sign in

Part 7

Diagramming Real-Time Data Systems

Diagramming Real-Time Data SystemsSign in
Timeline Diagrams: Event Sequences, Causality, and Ordering AssumptionsSign in
Producer-Broker-Consumer Flows: Queues and Logs as System TopologiesSign in
Processing DAGs and Stateful Operators: How to Draw Streaming Logic and Its StateSign in

Part 8

Step 0 Modeling: Time-Series and Event Shapes

Step 0 Modeling: Time-Series and Event ShapesSign in
Modeling Time-Series: Measurements, Tags/Labels, Values, and Cardinality RiskSign in
Modeling Events: Keys, Timestamps, Payloads, and Append-Only ThinkingSign in
Time-Series vs Events vs Logs: Choosing the Right Mental Model for the QuestionSign in

Part 9

Step 0 Sampling, Granularity, and Aggregation

Step 0 Sampling, Granularity, and AggregationSign in
Sampling and Downsampling: Choosing Resolution Without Destroying MeaningSign in
Buckets and Rolling Aggregates: Time-Based Grouping and "Sliding" InterpretationsSign in
Storage/Query Trade-offs: Write Patterns, Query Patterns, and Retention ConstraintsSign in

Part 10

Step 0 Minimal Stream Architecture

Step 0 Minimal Stream ArchitectureSign in
One Producer, One Consumer: The Smallest Working StreamSign in
Minimal Stores: Simple Append, Simple Reads, Simple ReplaysSign in
Early Failure Modes: Timestamp Mistakes, Inconsistent Keys, and Accidental Distributed TimeSign in

Part 11

Step 1 Messaging Models

Step 1 Messaging ModelsSign in
Point-to-Point vs Pub/Sub: Distribution, Fan-Out, and Coordination CostsSign in
Competing Consumers: Work Distribution and Concurrency BoundariesSign in
Messaging Anti-Patterns: Request/Response Misuse and Coupling That Breaks ScalingSign in

Part 12

Step 1 Queues vs Append-Only Logs

Step 1 Queues vs Append-Only LogsSign in
Queue Semantics: Destructive Consumption and "Who Owns the Work"Sign in
Log Semantics: Immutable Append, Offsets, and Replay as a FeatureSign in
Choosing the Model: Latency, Durability, Reprocessing, and Cost Trade SpacesSign in

Part 13

Step 1 Producers, Consumers, and Delivery Semantics

Step 1 Producers, Consumers, and Delivery SemanticsSign in
Producer Responsibilities: Batching, Retry, Serialization, and BackoffSign in
Consumer Responsibilities: Checkpoints, Idempotency, and Safe RestartsSign in
At-Most vs At-Least Once: What "Duplicates" Mean to Your DownstreamSign in

Part 14

Step 1 Offsets, Replay, and Reprocessing

Step 1 Offsets, Replay, and ReprocessingSign in
Offsets as Positions: Why "Where Am I?" Is the Core Recovery PrimitiveSign in
Rewind and Rebuild: Reconstructing Views From Logs and Designing for RepairSign in
Reprocessing as Normal: Backfills, Bug Fixes, and Schema EvolutionSign in

Part 15

Step 1 Message and Event Schema Design

Step 1 Message and Event Schema DesignSign in
Event Envelopes: Metadata, Correlation IDs, Trace IDs, and Routing FieldsSign in
Compatibility Over Time: Forward/Backward Evolution and Versioning DisciplineSign in
Schema Registries (Conceptual): Governance Mechanisms for Change at ScaleSign in

Part 16

Streaming Platforms as Distributed Logs

Streaming Platforms as Distributed LogsSign in
Topics and Partitions: Scaling Throughput via Parallel StreamsSign in
Replication and Durability: Why "A Log" Becomes "A System"Sign in
Control Plane Roles: Brokers, Controllers, and Coordination (Conceptual)Sign in

Part 17

Consumer Groups and Parallelism

Consumer Groups and ParallelismSign in
Consumer Groups: Scaling Reads While Preserving Per-Partition OrderSign in
Assignment and Rebalancing: Why Coordination Events Cause Latency SpikesSign in
Designing for Rebalance: Idempotency, Warm Caches, and Safe PausesSign in

Part 18

Retention, Compaction, and Storage Policies

Retention, Compaction, and Storage PoliciesSign in
Retention as a Product Choice: Time/Size Policies and How Far Back Can We Repair?Sign in
Compaction as a Pattern: Latest-Value-Per-Key vs Full HistorySign in
Storage Planning: Tiering, Cost Controls, and Long-Lived StreamsSign in

Part 19

Acknowledgments and Delivery Guarantees

Acknowledgments and Delivery GuaranteesSign in
Producer Acks: Durability vs Latency vs CostSign in
Committing Offsets: Recovery Behavior and Failure WindowsSign in
Guarantee Trade-offs: The Practical Meaning of "Durable Enough"Sign in

Part 20

Operating Streaming Clusters

Operating Streaming ClustersSign in
Capacity Planning: Throughput, Partitions, Replication, and DiskSign in
Upgrades and Scaling: Operational Risk, Rolling Changes, and Traffic SafetySign in
Multi-Tenancy: Quotas, Isolation, and Noisy-Neighbor ProblemsSign in

Part 21

Managed vs Self-Managed Streaming Services

Managed vs Self-Managed Streaming ServicesSign in
Responsibility Split: What You Keep vs What the Provider OwnsSign in
Cost and Lock-In: Portability, Hybrid Patterns, and Migration PlanningSign in
Operating Model Fit: Team Skill, Reliability Needs, and Compliance BoundariesSign in

Part 22

Batch vs Streaming vs Micro-Batch

Batch vs Streaming vs Micro-BatchSign in
Streaming as "Always-On Computation": Why Latency Budgets Shape EverythingSign in
Micro-Batch Models: Throughput Wins, Latency Costs, and Operational SimplicitySign in
Choosing the Model: Workload Shape, Cost, and Correctness ExpectationsSign in

Part 23

Stateless Stream Processing

Stateless Stream ProcessingSign in
Core Operators: Map/Filter/Route/Enrich and Simple AggregatesSign in
Lookups and Enrichment: Joining Against Reference Data Without Building State Machines AccidentallySign in
Idempotent Transforms: Safe Replays and Repeatable OutputsSign in

Part 24

Processing Topologies and DAGs

Processing Topologies and DAGsSign in
DAGs as the Mental Model: Sources -> Operators -> SinksSign in
Partitioning Through the Graph: Where Keys Change and Costs SpikeSign in
Operator Boundaries: Coupling, Isolation, and Debugging ErgonomicsSign in

Part 25

What Frameworks Provide

What Frameworks ProvideSign in
Runtime Scheduling and Execution: What "the Framework" Does for YouSign in
Connectors and Integrations: Sources and Sinks as First-Class Reliability BoundariesSign in
Preview of State: Why State Management Is the Hard Part You Are About to InheritSign in

Part 26

Designing Stream Processing Jobs

Designing Stream Processing JobsSign in
Decomposing Problems into Operators: Choosing Responsibilities That Survive ChangeSign in
Debuggability by Design: Observability Hooks and Introspection PointsSign in
Operational Readiness: Failure Modes, Restart Behavior, and Safe RolloutsSign in

Part 27

State in Streaming Systems

State in Streaming SystemsSign in
Per-Key State: The Unit of Scalability (and the Unit of Pain)Sign in
Aggregates and Summaries: Building "Live Views" over StreamsSign in
State Growth and Eviction: TTLs, Compaction, and What Happens When State Never DiesSign in

Part 28

Windowing Concepts

Windowing ConceptsSign in
Tumbling, Sliding, and Hopping Windows: Three Ways to Slice Time (and Three Ways to Confuse People)Sign in
Session Windows: Gap-Based Grouping and Behavioral InterpretationSign in
Late Events and Completion: Allowed Lateness and When Results Become "Final Enough"Sign in

Part 29

Streams–Tables Duality

Streams–Tables DualitySign in
Stream as Changelog: Turning Events into a Table of "Current Truth"Sign in
Materialized Views: Serving Real-Time Queryable StateSign in
Rebuild and Repair: Correctness via Replay and Deterministic ReconstructionSign in

Part 30

Joins in Streaming

Joins in StreamingSign in
Stream-Stream Joins: Windows, State, and Time AlignmentSign in
Stream-Table Joins: Enrichment and Dimension Lookups in MotionSign in
Watermarks and State Costs: How "Correctness" Consumes MemorySign in

Part 31

Stateful Operator Design

Stateful Operator DesignSign in
Safe State Handling: Snapshots, Migrations, and RebalancesSign in
Scaling Stateful Operators: Partitioning Strategies and HotspotsSign in
Operational Realities: State Bootstrap Time and Recovery SLOsSign in

Part 32

Data Modeling for Streaming Use Cases

Data Modeling for Streaming Use CasesSign in
Entities vs Events in Real Time: Modeling for Joins, Aggregates, and ReplaysSign in
Keys and Correlation: Multi-Stream Linkage Without ChaosSign in
Designing for the Future: Schema Choices That Keep Reprocessing ViableSign in

Part 33

Failures in Streaming Systems

Failures in Streaming SystemsSign in
Failure Taxonomy: Node Loss, Partitions, Slow Tasks, Poison MessagesSign in
Partial vs Full Failure: What Restarts Actually RestartSign in
Retries Revisited: Idempotency as the Survival TraitSign in

Part 34

Checkpointing and Recovery

Checkpointing and RecoverySign in
Consistent Snapshots: What It Means to "Save State" in a DAGSign in
Replay from Offsets + State: Rebuilding Exactly What You HadSign in
Checkpoint Frequency Trade-offs: Overhead vs Recovery Point ObjectivesSign in

Part 35

Delivery Semantics in Depth

Delivery Semantics in DepthSign in
At-Most Once: When Losing Data Is Acceptable (and How to Be Honest About It)Sign in
At-Least Once: Deduplication, Idempotent Sinks, and Practical CorrectnessSign in
Exactly/Effectively Once: End-to-End Constraints and Where Systems CheatSign in

Part 36

Transactions and Two-Phase Commit (Conceptual)

Transactions and Two-Phase Commit (Conceptual)Sign in
Atomic Write + Offset Commit: The Core Problem StatementSign in
When 2PC-Like Patterns Appear: Coordinating External Side EffectsSign in
When Not to Use It: Complexity Cliffs and Safer AlternativesSign in

Part 37

Compensating Actions and Sagas in Streaming

Compensating Actions and Sagas in StreamingSign in
Irreversible Side Effects: Why "Undo" Is a Product DecisionSign in
Eventual Correctness: Compensation Patterns and ReconciliationSign in
Designing for Human Repair: Auditability and Replayable HistorySign in

Part 38

Testing and Validating Guarantees

Testing and Validating GuaranteesSign in
Failure Injection: Restarts, Partitions, Slowdowns, and Chaos-Style TestsSign in
Synthetic Load Tests: Burst, Steady-State, and Pathological KeysSign in
Proving It in Practice: Observing Semantics Through Metrics and LogsSign in

Part 39

Observability for Streaming Pipelines

Observability for Streaming PipelinesSign in
Core Metrics: Throughput, Lag, End-to-End Latency, Watermark ProgressionSign in
Logs and Structured Operator Events: Making Failures DiagnosableSign in
Tracing Across Pipelines: Linking Ingest -> Process -> Serve in One StorySign in

Part 40

Backpressure and Flow Control

Backpressure and Flow ControlSign in
What Backpressure Means: When Sinks or Operators Cannot Keep UpSign in
Buffers and Bounded Queues: Where Latency Hides and Outages BeginSign in
Shedding and Degradation: Dropping, Sampling, and Protective LimitsSign in

Part 41

Scaling and Resource Management

Scaling and Resource ManagementSign in
Scaling via Partitions and Parallelism: Changing the Shape of WorkSign in
Resource Bottlenecks: CPU vs Memory vs Network vs Disk in PracticeSign in
Autoscaling and Elasticity: What Can Scale Automatically (and What Cannot)Sign in

Part 42

Hotspots and Skew

Hotspots and SkewSign in
Key Skew: The Enemy of ParallelismSign in
Mitigation Patterns: Salting, Repartitioning, Load-Aware RoutingSign in
Detecting Skew: Signals in Lag, Latency, and Per-Partition MetricsSign in

Part 43

SLOs and Capacity Planning

SLOs and Capacity PlanningSign in
Streaming SLOs: Freshness, Latency, Availability, and Correctness PostureSign in
Capacity Models: Headroom, Peaks, and Failure ScenariosSign in
Planning for Growth: New Producers, New Consumers, and Retention ExpansionsSign in

Part 44

Operational Playbooks and Runbooks

Operational Playbooks and RunbooksSign in
Incident Types: Lag Spikes, Stuck Partitions, Failing Jobs, Corrupt StateSign in
Triage and Mitigation: Pause, Drain, Reroute, Replay, Roll BackSign in
Postmortems and Continuous Improvement: Turning Incidents into Safer DefaultsSign in

Part 45

End-to-End Architecture Patterns

End-to-End Architecture PatternsSign in
Layered Architecture: Ingestion -> Transport -> Processing -> ServingSign in
Unified vs Split Architectures: Speed/Batch Layering and Convergence PatternsSign in
Serving Patterns: Materialized Views, OLAP-ish Stores, and Operational StoresSign in

Part 46

Real-Time Analytics and Monitoring

Real-Time Analytics and MonitoringSign in
Observability Pipelines as Streaming: Metrics and Logs as First-Class StreamsSign in
Dashboards and Near-Real-Time Queries: Freshness Guarantees and CostSign in
Integration Boundaries: Streaming -> Time-Series Stores and Query EnginesSign in

Part 47

Personalization, Recommendations, and Fraud Detection

Personalization, Recommendations, and Fraud DetectionSign in
Event-Driven Feature Pipelines: Turning Behavior into Features in MotionSign in
Real-Time Scoring (Conceptual): Model Calls as Sinks with Strict BudgetsSign in
Feedback Loops: Online Learning Risks and Governance BoundariesSign in

Part 48

IoT and Edge Streaming

IoT and Edge StreamingSign in
Device Streams: Intermittent Networks and Store-and-Forward RealitiesSign in
Gateways and Aggregation: Edge Filtering and Upstream CompressionSign in
Buffering and Reconciliation: Correctness Across Connectivity GapsSign in

Part 49

Multi-Region and Hybrid Architectures

Multi-Region and Hybrid ArchitecturesSign in
Regional Clusters and Locality: Latency and Sovereignty ConstraintsSign in
Cross-Region Replication: Durability vs Cost vs Consistency Trade-offsSign in
Hybrid Patterns: On-Prem Sources, Cloud Processing, Shared TopicsSign in

Part 50

Governance, Schemas, and Evolution

Governance, Schemas, and EvolutionSign in
Schema Governance: Registries, Reviews, and Compatibility PoliciesSign in
Ownership and Naming: Topic Conventions and Domain BoundariesSign in
Migration Patterns: Resharding, Topic Evolution, and Safe RewritesSign in

Part 51

Platformizing Streaming

Platformizing StreamingSign in
Streaming as an Internal Platform: Self-Service Ingestion and GuardrailsSign in
Templates and Golden Paths: Standard Jobs, Standard Connectors, Standard SLOsSign in
Developer Experience: Testing, Local Runs, Staging, and Safe Deployment WorkflowsSign in

Part 52

Reference Architectures and Case-Style Patterns

Reference Architectures and Case-Style PatternsSign in
Observability Pipeline: From Telemetry to Alerts and DashboardsSign in
Clickstream Analytics and Experimentation Feeds: Joins, Windows, and BackfillsSign in
Payments/Fraud Streams and IoT Telemetry: Exactly-Once Posture vs Best-Effort PostureSign in

Course overview