Building a Targeting System from Early Adopter Signals
Goal: Given a small set of early adopters, build a scoring model to identify users most likely to adopt a product.
Git Repo: github.com/dinesh-coderepo/targetting-system
🔑 Key Concepts at a Glance
| System | How It Works | Example |
|---|---|---|
| Recommendation | Learn from a user's own patterns → extend to similar items | "You watched X, try Y" |
| Targeting | Learn from early adopters' profiles → find similar non-adopters | "Users like your best customers" |
| Cold Start | Very few signals → traditional collaborative filtering fails | This blog's core challenge |
🏗️ System Architecture
flowchart TD
Data["📊 User Data<br/>(demographics + behavior)"] --> Features["🔧 Feature Engineering"]
Adopters["✅ Early Adopters<br/>(labeled = 1)"] --> Features
Features --> Model["🤖 Propensity Model<br/>(XGBoost / LogReg)"]
Model --> Scores["📈 Adoption Scores<br/>(0.0 → 1.0)"]
Scores --> TopK["🎯 Top-K Targets"]
Scores --> Eval["📊 Evaluation<br/>(AUC, Lift, Precision)"]
style Adopters fill:#4caf50,color:#fff
style TopK fill:#ff9800,color:#fff
🔧 Background & Prerequisites
1. Types of Recommendation Systems
graph TD
RS["Recommendation Systems"] --> CF["🤝 Collaborative Filtering"]
RS --> CB["📄 Content-Based"]
RS --> HY["🔀 Hybrid"]
RS --> DL["🧠 Deep Learning"]
CF --> UserCF["User-Based CF"]
CF --> ItemCF["Item-Based CF"]
CF --> MF["Matrix Factorization<br/>(SVD, ALS, NMF)"]
| Approach | How It Works | Pros | Cons |
|---|---|---|---|
| User-based CF | Find similar users → recommend their preferences | Intuitive | Doesn't scale; sparse |
| Item-based CF | Find similar items → recommend to liking users | Stable | Needs interaction data |
| Matrix Factorization | Decompose user-item matrix into latent factors | Handles sparsity | Cold start problem |
| Content-Based | Match item features to user preferences | No cold start for items | Limited to feature quality |
| Hybrid | Combine CF + content-based | Best of both worlds | Complex to implement |
💡 Netflix, Spotify, and YouTube all use hybrid approaches combining multiple methods.
2. The Cold Start Problem
This is the core challenge for this blog — very few adopters means extreme data sparsity.
graph LR
Problem["❄️ Cold Start<br/>Few adopters, no history"] --> S1["👤 User Cold Start"]
Problem --> S2["📦 Item Cold Start"]
S1 --> Sol1["🎯 Lookalike Modeling"]
S1 --> Sol2["📋 Onboarding Questions"]
S1 --> Sol3["📈 Popularity Fallback"]
S2 --> Sol4["🏷️ Metadata Matching"]
S2 --> Sol5["🆕 Exploration Boost"]
Solutions for targeting with few adopters:
- 🔹 Feature similarity — Match non-adopters against adopter feature profiles
- 🔹 Lookalike modeling — Find users who "look like" early adopters (demographics + behavior)
- 🔹 Propensity scoring — Binary classifier: adopter (1) vs non-adopter (0)
3. Propensity / Targeting Model
The heart of this project — scoring every user by their likelihood to adopt.
flowchart LR
Features["🔧 Features"] --> Train["🏋️ Train Model"]
Labels["🏷️ Labels<br/>1 = adopter<br/>0 = non-adopter"] --> Train
Train --> Predict["🔮 Predict<br/>P(adopt) for all users"]
Predict --> Rank["📊 Rank & Select<br/>Top targets"]
Feature Categories:
| Category | Example Features |
|---|---|
| 🧑 Demographic | Age, location, job title, industry |
| 📊 Behavioral | Login frequency, feature usage, time spent, page views |
| 🤝 Social | Connections to existing adopters, team adoption rate |
| ⏱️ Temporal | Recency, frequency, monetary (RFM analysis) |
Model Choices:
| Model | When to Use |
|---|---|
| Logistic Regression | Baseline — interpretable, fast. Understand odds ratios. |
| Random Forest / XGBoost | Better accuracy, non-linear relationships, feature importance |
| Neural Networks | Large-scale datasets with many features |
⚠️ Class Imbalance: If only 1% are adopters, naive models just predict "no" 99% of the time. Use SMOTE (oversampling), class weights, focal loss, or undersampling.
4. Evaluation Metrics
| Metric | What It Measures | Why It Matters |
|---|---|---|
| AUC-ROC | Discrimination ability across thresholds | Best single metric for targeting |
| Precision@K | Of top K predictions, how many are actual adopters | Directly measures targeting quality |
| Recall@K | Of all adopters, how many are in top K | Did we find most adopters? |
| Lift Chart | How much better than random selection | "Top 10% scored 5x more likely than random" |
| NDCG | Ranking quality with position weighting | Are true adopters ranked highest? |
⚠️ Never use accuracy with imbalanced data — it's misleading.
⚠️ Never random split — use time-based splits (train on past, test on future) to prevent data leakage.
5. Tools & Libraries
| Library | Purpose |
|---|---|
scikit-learn |
LogisticRegression, RandomForest, metrics, pipelines |
xgboost / lightgbm |
Gradient boosting for targeting models |
surprise |
Collaborative filtering (SVD, KNN, NMF) |
lightfm |
Hybrid recommendations (collaborative + content) |
implicit |
Implicit feedback models (ALS, BPR) |
pandas + numpy |
Data manipulation & feature engineering |
matplotlib + seaborn |
Visualization (lift charts, ROC curves) |
✅ TODO — Remaining Work
| # | Task | Priority |
|---|---|---|
| 1 | Implement basic collaborative filtering (user-item matrix, cosine similarity) | 🔴 High |
| 2 | Implement matrix factorization (SVD) with Surprise | 🔴 High |
| 3 | Build propensity model with logistic regression | 🔴 High |
| 4 | Feature engineering pipeline (behavioral + demographic) | 🔴 High |
| 5 | Handle class imbalance (SMOTE, class weights) | 🟡 Medium |
| 6 | Evaluate with AUC-ROC, lift charts, decile analysis | 🟡 Medium |
| 7 | Build cold-start fallback strategy | 🟡 Medium |
| 8 | Compare model approaches in a results table | 🟡 Medium |
| 9 | Add Mermaid architecture diagram of full targeting pipeline | 🟢 Low |
| 10 | Connect to Monolith paper learnings | 🟢 Low |
🔧 Reference Implementation — Propensity Model with Lookalike Scoring
A minimal but complete pipeline: given a tiny set of adopters, score every non-adopter for likelihood to adopt.
# targeting.py
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import roc_auc_score, average_precision_score
def build_dataset(users: pd.DataFrame, adopters: set[str]) -> tuple[pd.DataFrame, pd.Series]:
"""users has columns: user_id, age, logins_30d, features_used, tenure_days, team_size, industry.
adopters is a set of user_ids that already converted."""
df = users.copy()
df["label"] = df["user_id"].isin(adopters).astype(int)
y = df.pop("label")
X = pd.get_dummies(df.drop(columns=["user_id"]), columns=["industry"], drop_first=True)
return X, y
def train(X: pd.DataFrame, y: pd.Series):
X_tr, X_val, y_tr, y_val = train_test_split(
X, y, test_size=0.25, stratify=y, random_state=42
)
scaler = StandardScaler().fit(X_tr)
X_tr_s = scaler.transform(X_tr); X_val_s = scaler.transform(X_val)
# Baseline — logistic regression with class_weight for imbalance
lr = LogisticRegression(class_weight="balanced", max_iter=500).fit(X_tr_s, y_tr)
# Stronger — gradient boosting handles non-linear interactions
gb = GradientBoostingClassifier(n_estimators=200, max_depth=3).fit(X_tr, y_tr)
for name, model, X_eval in [("logreg", lr, X_val_s), ("gbm", gb, X_val)]:
p = model.predict_proba(X_eval)[:, 1]
print(f"{name}: AUC={roc_auc_score(y_val, p):.3f} "
f"PR-AUC={average_precision_score(y_val, p):.3f}")
return gb, scaler
def score_and_rank(model, users: pd.DataFrame, adopters: set[str], top_k: int = 1000):
"""Score all non-adopters and return the top-K targets with lift."""
non_adopters = users[~users["user_id"].isin(adopters)].copy()
X = pd.get_dummies(non_adopters.drop(columns=["user_id"]),
columns=["industry"], drop_first=True)
non_adopters["score"] = model.predict_proba(X)[:, 1]
ranked = non_adopters.sort_values("score", ascending=False)
base_rate = len(adopters) / len(users)
top = ranked.head(top_k)
# Lift = model's positive rate in top-K / random base rate
# (true labels not known for non-adopters — use held-out to measure lift in practice)
print(f"Base adoption rate: {base_rate:.3%} | Targeting top {top_k} users")
return ranked[["user_id", "score"]]
Evaluating with a Proper Time-Based Split
Random splits leak future information. In targeting, the adopters at time T became adopters because of behaviour before T. Evaluate like this:
# Split by signup date, not randomly
cutoff = "2025-06-01"
train_users = users[users["signup_date"] < cutoff]
test_users = users[users["signup_date"] >= cutoff]
# Adopters in each cohort
train_adopters = adopter_events.query("event_date < @cutoff")["user_id"].unique()
test_adopters = adopter_events.query("event_date >= @cutoff")["user_id"].unique()
Lift Chart — The Right Way to Present Results
def lift_chart(y_true, y_score, deciles=10):
df = pd.DataFrame({"y": y_true, "p": y_score}).sort_values("p", ascending=False)
df["decile"] = pd.qcut(df["p"].rank(method="first"), deciles, labels=False)
base = df["y"].mean()
table = df.groupby("decile")["y"].mean().rename("rate").to_frame()
table["lift"] = table["rate"] / base
return table.sort_index(ascending=False)
A healthy targeting model shows the top decile at 3–10× lift over baseline. If the top decile is only 1.5×, your features aren't predictive — go back to feature engineering before tuning the model.
When every TODO above is ticked and your lift chart shows ≥ 3× in the top decile on a time-based test set, flip this post to status: published.