Learning how to build an recommendation system from initial signals

From a few initial adopters of a product, how we can target new set of users who are more likely can use the product

Building a Targeting System from Early Adopter Signals

Goal: Given a small set of early adopters, build a scoring model to identify users most likely to adopt a product.

Git Repo: github.com/dinesh-coderepo/targetting-system


🔑 Key Concepts at a Glance

System How It Works Example
Recommendation Learn from a user's own patterns → extend to similar items "You watched X, try Y"
Targeting Learn from early adopters' profiles → find similar non-adopters "Users like your best customers"
Cold Start Very few signals → traditional collaborative filtering fails This blog's core challenge

🏗️ System Architecture

flowchart TD
    Data["📊 User Data<br/>(demographics + behavior)"] --> Features["🔧 Feature Engineering"]
    Adopters["✅ Early Adopters<br/>(labeled = 1)"] --> Features
    Features --> Model["🤖 Propensity Model<br/>(XGBoost / LogReg)"]
    Model --> Scores["📈 Adoption Scores<br/>(0.0 → 1.0)"]
    Scores --> TopK["🎯 Top-K Targets"]
    Scores --> Eval["📊 Evaluation<br/>(AUC, Lift, Precision)"]

    style Adopters fill:#4caf50,color:#fff
    style TopK fill:#ff9800,color:#fff

🔧 Background & Prerequisites

1. Types of Recommendation Systems

graph TD
    RS["Recommendation Systems"] --> CF["🤝 Collaborative Filtering"]
    RS --> CB["📄 Content-Based"]
    RS --> HY["🔀 Hybrid"]
    RS --> DL["🧠 Deep Learning"]
    CF --> UserCF["User-Based CF"]
    CF --> ItemCF["Item-Based CF"]
    CF --> MF["Matrix Factorization<br/>(SVD, ALS, NMF)"]
Approach How It Works Pros Cons
User-based CF Find similar users → recommend their preferences Intuitive Doesn't scale; sparse
Item-based CF Find similar items → recommend to liking users Stable Needs interaction data
Matrix Factorization Decompose user-item matrix into latent factors Handles sparsity Cold start problem
Content-Based Match item features to user preferences No cold start for items Limited to feature quality
Hybrid Combine CF + content-based Best of both worlds Complex to implement

💡 Netflix, Spotify, and YouTube all use hybrid approaches combining multiple methods.


2. The Cold Start Problem

This is the core challenge for this blog — very few adopters means extreme data sparsity.

graph LR
    Problem["❄️ Cold Start<br/>Few adopters, no history"] --> S1["👤 User Cold Start"]
    Problem --> S2["📦 Item Cold Start"]
    S1 --> Sol1["🎯 Lookalike Modeling"]
    S1 --> Sol2["📋 Onboarding Questions"]
    S1 --> Sol3["📈 Popularity Fallback"]
    S2 --> Sol4["🏷️ Metadata Matching"]
    S2 --> Sol5["🆕 Exploration Boost"]

Solutions for targeting with few adopters:

  • 🔹 Feature similarity — Match non-adopters against adopter feature profiles
  • 🔹 Lookalike modeling — Find users who "look like" early adopters (demographics + behavior)
  • 🔹 Propensity scoring — Binary classifier: adopter (1) vs non-adopter (0)

3. Propensity / Targeting Model

The heart of this project — scoring every user by their likelihood to adopt.

flowchart LR
    Features["🔧 Features"] --> Train["🏋️ Train Model"]
    Labels["🏷️ Labels<br/>1 = adopter<br/>0 = non-adopter"] --> Train
    Train --> Predict["🔮 Predict<br/>P(adopt) for all users"]
    Predict --> Rank["📊 Rank & Select<br/>Top targets"]

Feature Categories:

Category Example Features
🧑 Demographic Age, location, job title, industry
📊 Behavioral Login frequency, feature usage, time spent, page views
🤝 Social Connections to existing adopters, team adoption rate
⏱️ Temporal Recency, frequency, monetary (RFM analysis)

Model Choices:

Model When to Use
Logistic Regression Baseline — interpretable, fast. Understand odds ratios.
Random Forest / XGBoost Better accuracy, non-linear relationships, feature importance
Neural Networks Large-scale datasets with many features

⚠️ Class Imbalance: If only 1% are adopters, naive models just predict "no" 99% of the time. Use SMOTE (oversampling), class weights, focal loss, or undersampling.


4. Evaluation Metrics

Metric What It Measures Why It Matters
AUC-ROC Discrimination ability across thresholds Best single metric for targeting
Precision@K Of top K predictions, how many are actual adopters Directly measures targeting quality
Recall@K Of all adopters, how many are in top K Did we find most adopters?
Lift Chart How much better than random selection "Top 10% scored 5x more likely than random"
NDCG Ranking quality with position weighting Are true adopters ranked highest?

⚠️ Never use accuracy with imbalanced data — it's misleading.

⚠️ Never random split — use time-based splits (train on past, test on future) to prevent data leakage.


5. Tools & Libraries

Library Purpose
scikit-learn LogisticRegression, RandomForest, metrics, pipelines
xgboost / lightgbm Gradient boosting for targeting models
surprise Collaborative filtering (SVD, KNN, NMF)
lightfm Hybrid recommendations (collaborative + content)
implicit Implicit feedback models (ALS, BPR)
pandas + numpy Data manipulation & feature engineering
matplotlib + seaborn Visualization (lift charts, ROC curves)

✅ TODO — Remaining Work

# Task Priority
1 Implement basic collaborative filtering (user-item matrix, cosine similarity) 🔴 High
2 Implement matrix factorization (SVD) with Surprise 🔴 High
3 Build propensity model with logistic regression 🔴 High
4 Feature engineering pipeline (behavioral + demographic) 🔴 High
5 Handle class imbalance (SMOTE, class weights) 🟡 Medium
6 Evaluate with AUC-ROC, lift charts, decile analysis 🟡 Medium
7 Build cold-start fallback strategy 🟡 Medium
8 Compare model approaches in a results table 🟡 Medium
9 Add Mermaid architecture diagram of full targeting pipeline 🟢 Low
10 Connect to Monolith paper learnings 🟢 Low

🔧 Reference Implementation — Propensity Model with Lookalike Scoring

A minimal but complete pipeline: given a tiny set of adopters, score every non-adopter for likelihood to adopt.

# targeting.py
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score, average_precision_score

def build_dataset(users: pd.DataFrame, adopters: set[str]) -> tuple[pd.DataFrame, pd.Series]:
    """users has columns: user_id, age, logins_30d, features_used, tenure_days, team_size, industry.
       adopters is a set of user_ids that already converted."""
    df = users.copy()
    df["label"] = df["user_id"].isin(adopters).astype(int)
    y = df.pop("label")
    X = pd.get_dummies(df.drop(columns=["user_id"]), columns=["industry"], drop_first=True)
    return X, y

def train(X: pd.DataFrame, y: pd.Series):
    X_tr, X_val, y_tr, y_val = train_test_split(
        X, y, test_size=0.25, stratify=y, random_state=42
    )
    scaler = StandardScaler().fit(X_tr)
    X_tr_s = scaler.transform(X_tr); X_val_s = scaler.transform(X_val)

    # Baseline — logistic regression with class_weight for imbalance
    lr = LogisticRegression(class_weight="balanced", max_iter=500).fit(X_tr_s, y_tr)
    # Stronger — gradient boosting handles non-linear interactions
    gb = GradientBoostingClassifier(n_estimators=200, max_depth=3).fit(X_tr, y_tr)

    for name, model, X_eval in [("logreg", lr, X_val_s), ("gbm", gb, X_val)]:
        p = model.predict_proba(X_eval)[:, 1]
        print(f"{name}: AUC={roc_auc_score(y_val, p):.3f}  "
              f"PR-AUC={average_precision_score(y_val, p):.3f}")
    return gb, scaler

def score_and_rank(model, users: pd.DataFrame, adopters: set[str], top_k: int = 1000):
    """Score all non-adopters and return the top-K targets with lift."""
    non_adopters = users[~users["user_id"].isin(adopters)].copy()
    X = pd.get_dummies(non_adopters.drop(columns=["user_id"]),
                       columns=["industry"], drop_first=True)
    non_adopters["score"] = model.predict_proba(X)[:, 1]
    ranked = non_adopters.sort_values("score", ascending=False)

    base_rate = len(adopters) / len(users)
    top = ranked.head(top_k)
    # Lift = model's positive rate in top-K / random base rate
    # (true labels not known for non-adopters — use held-out to measure lift in practice)
    print(f"Base adoption rate: {base_rate:.3%} | Targeting top {top_k} users")
    return ranked[["user_id", "score"]]

Evaluating with a Proper Time-Based Split

Random splits leak future information. In targeting, the adopters at time T became adopters because of behaviour before T. Evaluate like this:

# Split by signup date, not randomly
cutoff = "2025-06-01"
train_users = users[users["signup_date"] < cutoff]
test_users  = users[users["signup_date"] >= cutoff]

# Adopters in each cohort
train_adopters = adopter_events.query("event_date < @cutoff")["user_id"].unique()
test_adopters  = adopter_events.query("event_date >= @cutoff")["user_id"].unique()

Lift Chart — The Right Way to Present Results

def lift_chart(y_true, y_score, deciles=10):
    df = pd.DataFrame({"y": y_true, "p": y_score}).sort_values("p", ascending=False)
    df["decile"] = pd.qcut(df["p"].rank(method="first"), deciles, labels=False)
    base = df["y"].mean()
    table = df.groupby("decile")["y"].mean().rename("rate").to_frame()
    table["lift"] = table["rate"] / base
    return table.sort_index(ascending=False)

A healthy targeting model shows the top decile at 3–10× lift over baseline. If the top decile is only 1.5×, your features aren't predictive — go back to feature engineering before tuning the model.

When every TODO above is ticked and your lift chart shows ≥ 3× in the top decile on a time-based test set, flip this post to status: published.