Course overview

How to Design Search & Recommendation Engines

53 modules
213 lessons
—
Part 1

Content

  1. Diversity as a Constraint: Relevance Plus Coverage Across Categories and IntentsSign in

  2. Long-Tail and New Content: Introduction Policies and Cold-Start MitigationSign in

  3. Avoiding Monotony: List-Level Objectives and How They Change RankingSign in

Part 2

Course Setup and the Incremental Ladder

  1. Course Setup and the Incremental LadderSign in

  2. Why "Queries to Knowledge"Sign in

  3. How to Use This CourseSign in

  4. The Incremental Ladder (Step 0 -> Step 7)Sign in

  5. The Course LensesSign in

  6. Diagram Legend and Notation TypesSign in

Part 3

What Is a Search and Recommendation System?

  1. What Is a Search and Recommendation System?Sign in

  2. Search, Recommendation, Discovery, and NavigationSign in

  3. Core Objects: Items, Users, Queries, and InteractionsSign in

  4. Knowledge as Organized AccessSign in

Part 4

Core IR Concepts

  1. Core IR ConceptsSign in

  2. Relevance as a RelationshipSign in

  3. Precision and Recall IntuitionSign in

  4. Retrieval vs RankingSign in

Part 5

Content and Metadata Modeling

  1. Content and Metadata ModelingSign in

  2. Documents, Fields, AttributesSign in

  3. Items, Categories, Tags, FacetsSign in

  4. Structured vs Unstructured vs Semi-StructuredSign in

Part 6

Signals: Behavioral, Textual, and Structural

  1. Signals: Behavioral, Textual, and StructuralSign in

  2. Text SignalsSign in

  3. User SignalsSign in

  4. Structural SignalsSign in

Part 7

Diagramming Search and Rec Systems

  1. Diagramming Search and Rec SystemsSign in

  2. Indexing Pipelines: Freshness and Correctness BoundariesSign in

  3. Query Pipelines: Latency-Budgeted DataflowSign in

  4. Recommendation Flows: Multi-Stage Decision PipelinesSign in

Part 8

Step 0 Inverted Indexes

  1. Step 0 Inverted IndexesSign in

  2. Terms to Postings Lists: The Core Structure That Makes Retrieval Feasible at ScaleSign in

  3. Doc IDs, Positions, Statistics: What You Store Determines What You Can Score LaterSign in

  4. Index Size vs Query Speed: Trading Storage for Latency and Feature RichnessSign in

Part 9

Step 0 Retrieval Models (Conceptual)

  1. Step 0 Retrieval Models (Conceptual)Sign in

  2. Bag-of-Words Framing: What You Gain in Robustness and What You Lose in MeaningSign in

  3. tf-idf Intuition: Why Rarity and Frequency Matter Differently in ScoringSign in

  4. Probabilistic Views at a High Level: Treating Relevance as Uncertainty Rather Than a Binary PropertySign in

Part 10

Step 0 Document Scoring and Ranking Signals

  1. Step 0 Document Scoring and Ranking SignalsSign in

  2. Term Matching and Field Weights: Turning Index Evidence Into a Baseline ScoreSign in

  3. Document Features: Recency, Popularity, Quality and How "Business Truth" Enters RankingSign in

  4. Combining Text and Metadata Signals: Designing Score Functions That Remain DebuggableSign in

Part 11

Step 0 Indexing Pipelines

  1. Step 0 Indexing PipelinesSign in

  2. Full Builds vs Incremental Updates: Choosing a Freshness Posture and Operational ModelSign in

  3. Analysis Chains: Parsing, Tokenization, Normalization as the Content-to-Index ContractSign in

  4. Updates, Deletions, Reindexing: Correctness and Lifecycle Mechanics for Changing CorporaSign in

Part 12

Step 0 Basic Search Experience Design

  1. Step 0 Basic Search Experience DesignSign in

  2. Result Pages: Snippets, Highlighting, Facets as Interpretation Aids for the UserSign in

  3. Pagination vs Infinite Scroll: Interaction Mechanics That Feed Back Into Ranking SignalsSign in

  4. First Ranking Tuning: Boosts by Field/Recency/Type as Controlled Changes to Relevance BehaviorSign in

Part 13

Step 1 Query Parsing and Tokenization

  1. Step 1 Query Parsing and TokenizationSign in

  2. Tokenization and Normalization: Stemming/Lemmatization Concepts, Casing, and Character HandlingSign in

  3. Punctuation, Symbols, Special Cases: Preserving User Intent Without Exploding ComplexitySign in

  4. Language Handling: Detection and Multilingual Considerations as Retrieval BoundariesSign in

Part 14

Step 1 Query Operators and Filters

  1. Step 1 Query Operators and FiltersSign in

  2. Fielded Queries, Filters, Facets: Separating Retrieval Constraints from Ranking PreferencesSign in

  3. Boolean, Phrase, Range Queries: Expressiveness Versus Performance and ExplainabilitySign in

  4. Query Builders vs Free-Form Strings: Controlling User Power While Preventing AmbiguitySign in

Part 15

Step 1 Spell Correction and Did-You-Mean

  1. Step 1 Spell Correction and Did-You-MeanSign in

  2. Detecting Typos Conceptually: Unknown Terms, Noisy Input, and Long-Tail VocabularySign in

  3. Candidate Corrections and Ranking: Choosing the "Least Surprising" FixSign in

  4. Auto-Correct vs Suggest: Managing Trust, Reversibility, and User ControlSign in

Part 16

Step 1 Synonyms, Expansions, and Normalization

  1. Step 1 Synonyms, Expansions, and NormalizationSign in

  2. Synonym Dictionaries and Expansion: Improving Recall Without Destroying PrecisionSign in

  3. Abbreviations, Aliases, Alternates: Handling User Language as a Product SurfaceSign in

  4. Canonical Concepts: Normalizing Inputs to Stable Entities That Survive Content ChurnSign in

Part 17

Step 1 Query Rewriting Strategies

  1. Step 1 Query Rewriting StrategiesSign in

  2. Recall vs Precision Rewrites: Making Trade-Offs Explicit per Query ClassSign in

  3. Session Context as Hints: Using Recent Activity Without Making Behavior ConfusingSign in

  4. Business Rules in Rewriting: Controlled Interventions That Remain Observable and GovernableSign in

Part 18

Step 1 Query Understanding Beyond Keywords (Conceptual)

  1. Step 1 Query Understanding Beyond Keywords (Conceptual)Sign in

  2. Intent Classes: Navigational/Informational/Transactional as Different Ranking ObjectivesSign in

  3. Entity Recognition and Classification: When Semantic Structure Is Worth the Added System CostSign in

  4. Cost/Benefit of Understanding: Correctness, Latency, and Maintainability ImplicationsSign in

Part 19

Step 2 Relevance Metrics

  1. Step 2 Relevance MetricsSign in

  2. Precision@k, Recall@k, NDCG Concepts: Why Ranking Quality Is Not a Single NumberSign in

  3. Click Metrics and Position Bias: When User Behavior Is Evidence and When It Is DistortionSign in

  4. Offline vs Online Metrics: Using Each for What It Can Truthfully SupportSign in

Part 20

Step 2 Labeled Data and Judgments

  1. Step 2 Labeled Data and JudgmentsSign in

  2. Manual Judgments: Building Ground Truth and Managing SubjectivitySign in

  3. Logs as Weak Labels: Leveraging Scale While Respecting Confounding and Exposure BiasSign in

  4. Maintaining Evaluation Sets: Drift, Coverage, and Keeping Tests RepresentativeSign in

Part 21

Step 2 Offline Evaluation and Experimentation

  1. Step 2 Offline Evaluation and ExperimentationSign in

  2. Query Replay: Testing New Ranking Logic Against Frozen Corpora and JudgmentsSign in

  3. Dashboards and Reports: Making Changes Visible, Comparable, and AuditableSign in

  4. Guardrails Against Regressions: Defining "Must Not Get Worse" ConstraintsSign in

Part 22

Step 2 Online Testing and A/B Frameworks

  1. Step 2 Online Testing and A/B FrameworksSign in

  2. Bucketing Users or Queries: Designing Experiments That Respect Interference and User ExperienceSign in

  3. Measuring Impact: Aligning Clicks and Downstream Goals with Product OutcomesSign in

  4. Noisy Results and Duration: Interpreting Uncertainty Without Shipping RandomnessSign in

Part 23

Step 2 Relevance Tuning and Rules

  1. Step 2 Relevance Tuning and RulesSign in

  2. Boosting and Demoting Features: Controlled Levers for Recency, Quality, and Business PrioritiesSign in

  3. Head vs Tail Queries: Where Rules Help and Where They Become UnmaintainableSign in

  4. Business Objectives vs Relevance: Margin and Diversity Constraints as Explicit PolicySign in

Part 24

Step 2 Feature-Based Ranking (Conceptual)

  1. Step 2 Feature-Based Ranking (Conceptual)Sign in

  2. Feature Families: Match Strength, Popularity, Freshness, and Early Personalization SignalsSign in

  3. Rankers as Functions: Combining Features Without Committing to a Specific Algorithm StorySign in

  4. Feature Iteration as Tuning: How Changes Create New Failure Surfaces and Test ObligationsSign in

Part 25

Step 2 Governance and Change Management

  1. Step 2 Governance and Change ManagementSign in

  2. Review Processes: Making Relevance Changes a Controlled Release, Not an Ad Hoc TweakSign in

  3. Rollout and Rollback: Safe Delivery Patterns for Ranking BehaviorSign in

  4. Change History and Impact: Preserving Institutional Memory to Prevent Cyclic MistakesSign in

Part 26

Step 3 Recommendations vs Search

  1. Step 3 Recommendations vs SearchSign in

  2. Different Contracts: Expressed Intent Versus Inferred PreferenceSign in

  3. Recommendation Surfaces: Feeds, "You May Also Like," Related Items, and the Objectives They ImplySign in

  4. Blending Recommendations with Search: Inserting Proposals into Query-Driven Flows Without Breaking TrustSign in

Part 27

Step 3 User–Item Interactions as Data

  1. Step 3 User–Item Interactions as DataSign in

  2. Event Types: Views, Clicks, Purchases, Likes, and Ratings as Different Strength SignalsSign in

  3. Implicit vs Explicit Feedback: Interpreting Absence and Dealing with Noisy PositivesSign in

  4. Matrices and Sparsity: Why Cold-Start and Long-Tail Behavior Are the DefaultSign in

Part 28

Step 3 Collaborative Filtering (Conceptual)

  1. Step 3 Collaborative Filtering (Conceptual)Sign in

  2. Similar Users and Similar Items: Nearest-Neighbor Intuition and the Limits of Co-OccurrenceSign in

  3. "People Who Liked X": Turning Local Similarity Into Candidate GenerationSign in

  4. Cold-Start Limits: When CF Fails Structurally and How to Detect It OperationallySign in

Part 29

Step 3 Content-Based Recommendation

  1. Step 3 Content-Based RecommendationSign in

  2. Item Features: Text, Categories, Attributes as a Substitute for Missing Interaction HistorySign in

  3. User Profiles from Content: Matching Preferences to Item RepresentationsSign in

  4. Trade-Offs vs Collaborative: Controllability, Novelty, and Overspecialization RisksSign in

Part 30

Step 3 Candidate Generation and Ranking for Recs

  1. Step 3 Candidate Generation and Ranking for RecsSign in

  2. Two-Stage Thinking: Candidates First, Ranking Second as a Latency and Scale BoundarySign in

  3. Multiple Candidate Sources: CF, Content-Based, Popularity, Recency and How to Blend ThemSign in

  4. Ranking Recs with Business Signals: Governance and Explainability Under Product ConstraintsSign in

Part 31

Step 3 Blending Search and Recs

  1. Step 3 Blending Search and RecsSign in

  2. Query-Time Recommendations: Related Items in Result Contexts and Their Relevance ObligationsSign in

  3. Navigational Contexts: Item Pages, Carts, Playlists as Different Recommendation ProblemsSign in

  4. Avoiding Echo Chambers: Overspecialization and Homogenization as System Dynamics IssuesSign in

Part 32

Step 4 Vector Representations and Embeddings

  1. Step 4 Vector Representations and EmbeddingsSign in

  2. Text, Items, and Users as Vectors: What Representation Buys and What It HidesSign in

  3. Similarity in Vector Space: Nearest Neighbors as a New Retrieval ContractSign in

  4. Contextual vs Static Representations: Stability Versus Sensitivity Trade-OffsSign in

Part 33

Step 4 Vector Indices and ANN Search (Conceptual)

  1. Step 4 Vector Indices and ANN Search (Conceptual)Sign in

  2. Nearest Neighbor Search: Why Brute Force Fails and Indexing Becomes NecessarySign in

  3. Approximate vs Exact: Trading Recall for Latency and Cost Under BudgetsSign in

  4. Index Structures Conceptually: Understanding Failure Modes Without Committing to One AlgorithmSign in

Part 34

Step 4 Semantic Search Pipelines

  1. Step 4 Semantic Search PipelinesSign in

  2. Query and Document Embeddings: Defining the Embedding Boundary and Update CadenceSign in

  3. Vector Retrieval Candidates: Using Similarity to Generate a Candidate SetSign in

  4. Hybrid Search: Lexical + Semantic as Complementary Evidence, Not Competing DogmaSign in

Part 35

Step 4 Reranking and Multi-Stage Retrieval

  1. Step 4 Reranking and Multi-Stage RetrievalSign in

  2. Fast First Stage: Retrieving Enough Candidates Cheaply to Avoid Missing Good AnswersSign in

  3. Rich Second Stage: Reranking with More Features Under Tighter Compute BudgetsSign in

  4. Handling Hard Queries: Special Cases That Justify Deeper Modeling or RulesSign in

Part 36

Step 4 Embeddings for Recommendations (Conceptual)

  1. Step 4 Embeddings for Recommendations (Conceptual)Sign in

  2. Users and Items in One Space: Matching as Geometric Proximity Rather Than RulesSign in

  3. Similarity-Based Recs: Using Vector Neighborhoods for Item-Item and User-Item ProposalsSign in

  4. Hybrid Recommendation: Combining Embeddings with CF and Content-Based Methods to Manage Failure ModesSign in

Part 37

Step 4 Quality, Bias, and Interpretability in Semantic Systems

  1. Step 4 Quality, Bias, and Interpretability in Semantic SystemsSign in

  2. Semantic Failure Modes: Off-Topic Matches, Oversimilarity, and Missing ConstraintsSign in

  3. Bias and Representation Issues: High-Level Risks Introduced by Learned RepresentationsSign in

  4. Debugging Semantic Relevance: Tooling and Practices to Make Vectors Operationally TractableSign in

Part 38

Step 5 Personalization Signals

  1. Step 5 Personalization SignalsSign in

  2. Short-Term Behavior: Session Intent and Recency-Weighted EvidenceSign in

  3. Long-Term Behavior: History, Preferences, Segments, and the Stability-Novelty TensionSign in

  4. Contextual Signals: Device, Location, Time, and Entry Point as High-Level Modifiers with Governance NeedsSign in

Part 39

Step 5 User Profiles and State

  1. Step 5 User Profiles and StateSign in

  2. Profile Structures: Interests and Recency-Weighted Histories as Explicit State ModelsSign in

  3. Storage and Refresh: Consistency, Staleness, and Update StrategiesSign in

  4. Per-User vs Cohort Personalization: Scaling Personalization Without Amplifying Privacy RiskSign in

Part 40

Step 5 Session-Based and Contextual Personalization

  1. Step 5 Session-Based and Contextual PersonalizationSign in

  2. Predicting Next Action Conceptually: Session Signals as a Short-Horizon ModelSign in

  3. Context-Aware Ranking: Adjusting Ordering Without Making the System Feel InconsistentSign in

  4. Avoiding Surprise: Designing Personalization That Remains Legible to UsersSign in

Part 41

Step 5 Feedback Loops and System Dynamics

  1. Step 5 Feedback Loops and System DynamicsSign in

  2. Recommendation Shapes Data: Exposure Changes the Distribution of Future EvidenceSign in

  3. Popularity Bias and Filter Bubbles: Conceptual Dynamics and How to Detect ThemSign in

  4. Exploration vs Exploitation: Managing Learning and User Experience Under UncertaintySign in

Part 42

Step 5 Diversity, Novelty, and Serendipity

  1. Step 5 Diversity, Novelty, and SerendipitySign in

  2. Diversity as a Constraint: Relevance Plus Coverage Across Categories and IntentsSign in

  3. Long-Tail and New Content: Introduction Policies and Cold-Start MitigationSign in

  4. Avoiding Monotony: List-Level Objectives and How They Change RankingSign in

Part 43

Step 5 Privacy, Ethics, and Personalization Boundaries

  1. Step 5 Privacy, Ethics, and Personalization BoundariesSign in

  2. Transparent Data Use: Aligning Personalization With Consent and User ExpectationsSign in

  3. User Control: Opt-Outs, Preference Editing, and Reversible PersonalizationSign in

  4. High-Level Compliance Constraints: Designing Boundaries That Prevent Accidental OverreachSign in

Part 44

Step 6 Multi-Stage Search Architectures

  1. Step 6 Multi-Stage Search ArchitecturesSign in

  2. Retrieval to Ranking to Post-Processing: The Canonical Multi-Stage Pipeline and Why It ExistsSign in

  3. Latency Budgets and SLAs: Allocating Time Across Stages and Enforcing ItSign in

  4. Caching and Reuse: Trading Freshness for Speed With Explicit PolicySign in

Part 45

Step 6 Indexing and Freshness at Scale

  1. Step 6 Indexing and Freshness at ScaleSign in

  2. Incremental Indexing and CDC: Keeping the Index Aligned With Changing ContentSign in

  3. High Update Rates: News, Social, and Marketplaces as Stress Tests for Freshness SemanticsSign in

  4. Freshness vs Consistency vs Cost: Choosing Which Failure You TolerateSign in

Part 46

Step 6 Distributed and Sharded Search

  1. Step 6 Distributed and Sharded SearchSign in

  2. Sharding Strategies: ID, Category, Geography, and What Each Implies for Recall and RoutingSign in

  3. Fan-Out and Merge: Distributed Query Execution and Ranking ReconciliationSign in

  4. Capacity Planning: Scaling Indices, Replicas, and Query Throughput Under Peak LoadSign in

Part 47

Step 6 Large-Scale Recommendation Architectures

  1. Step 6 Large-Scale Recommendation ArchitecturesSign in

  2. Offline Pipelines: Feature and Model Preparation as a Batch Reliability Problem (Conceptual)Sign in

  3. Online Serving: Feature Lookup -> Candidates -> Ranking Under Tight Latency and Correctness ConstraintsSign in

  4. Batch + Streaming Signals: Integrating Freshness Without Destabilizing Ranking BehaviorSign in

Part 48

Step 6 Multi-Tenant Search and Rec Platforms

  1. Step 6 Multi-Tenant Search and Rec PlatformsSign in

  2. Platform for Many Domains: Configuration Versus Code as a Scaling StrategySign in

  3. Tenant Isolation: Traffic, Data, and Ranking Behavior as Separate Failure DomainsSign in

  4. Domain Customization: Allowing Variation Without Fragmenting the PlatformSign in

Part 49

Step 6 Observability and Reliability

  1. Step 6 Observability and ReliabilitySign in

  2. Core Metrics: Latency, Errors, Freshness, Coverage, and Why Quality Is Also an SLOSign in

  3. Tracing Decisions: Debugging Query Paths and Recommendation Choices End-to-EndSign in

  4. SLOs for Quality and Performance: Defining "Good Enough" and Wiring It to Alerts and GovernanceSign in

Part 50

Step 7 Search and Recs as a Product

  1. Step 7 Search and Recs as a ProductSign in

  2. Stakeholders: End-Users, Internal Teams, Content Owners, and the Competing Objectives They BringSign in

  3. Roadmapping Relevance: Prioritizing Improvements Across Recall, Precision, Latency, and Business ConstraintsSign in

  4. Communicating Change: Setting Expectations and Interpreting Impact Without Overclaiming CausalitySign in

Part 51

Step 7 Tooling for Relevance and Product Teams

  1. Step 7 Tooling for Relevance and Product TeamsSign in

  2. Query Analysis Dashboards: Making Failure Cases Discoverable and DebuggableSign in

  3. Relevance Labs: Safe Playgrounds for Ranking and Rec Experiments with Traceable DiffsSign in

  4. Configuration UIs and Rule Management: Enabling Non-Engineers While Preserving GuardrailsSign in

Part 52

Step 7 Experimentation and Governance at Scale

  1. Step 7 Experimentation and Governance at ScaleSign in

  2. Coordinating Many Experiments: Preventing Interference Across Surfaces and SegmentsSign in

  3. Guardrails and Global Constraints: Shared Metrics That Prevent Local Optimization from Harming the PlatformSign in

  4. Approval and Rollout Processes: Treating Relevance as Controlled Production BehaviorSign in

Part 53

Step 7 Reference Architectures and Maturity Models

  1. Step 7 Reference Architectures and Maturity ModelsSign in

  2. Early Stage: Simple Index and Basic Ranking as a Stable BaselineSign in

  3. Growth Stage: Query Understanding, Recommendations, and Experimentation LoopsSign in

  4. Mature Stage: Multi-Stage Pipelines, Semantic Search, Personalization, and PlatformizationSign in