The Hidden Risk of Using Source in Predictive Models
- Kelley Hawes
- Sep 17
- 4 min read

Predictive models are powerful tools for making smarter marketing decisions, but as with any model, the details matter. The variables you include in your model can have a huge impact on the model’s usefulness - and present some potential pitfalls. (For more background on how predictive modeling helps businesses connect campaign investments with future revenue by identifying the characteristics that determine lead quality, see my blog post here).
One common consideration is whether to include source as a variable (e.g., the network or campaign that generated a lead). At first glance, it makes sense since different audiences generate different types of leads. But when predictive values are used not just for reporting, but for network optimization, including source as a variable can introduce risks that undermine campaign performance.
Why consider including source in your predictive model?
Different sources do perform differently. Paid search, social ads, direct traffic, and organic content all bring in audiences with unique profiles and behaviors. Even within a single network, you may see vastly different performance from users who search on general top-of-funnel keywords compared to those searching on targeted branded terms. From a predictive modeling standpoint, source can often stand out as a variable that correlates with differences in down-funnel value.
What are the potential ramifications?
Models become fragile when campaigns or networks change.
If you change anything about your campaigns (messaging, targeting, landing pages) or if the network itself updates its algorithms, the historic data you used to train the model quickly becomes outdated. While this is true of any variable, source is particularly prone to frequent shifts.
We’ve seen this firsthand with a SaaS client that was using a predictive model to prioritize leads for their sales team with limited capacity. Based on the historic data used to build the model, leads from paid campaigns converted at a lower rate than direct traffic. The client incorporated this into their predictive model and assigned a negative offset to each lead that was sourced to a paid campaign. As our team managed ads and made campaign optimizations, the quality of leads tracked to paid campaigns significantly increased. But because the client didn’t have a process in place to recalibrate their model, leads were still assigned the negative offset. This led to leads from paid campaigns being undervalued in the model, making them less likely to be assigned to a sales rep. The outcome was a self-fulfilling prophecy: paid leads closed less often, not because they were lower quality, but because they weren’t being worked by sales.
When the client did recalibrate their model and removed source as a variable, the impact was clear:
An 8% increase in deals generated from paid sources
A 14% increase in paid close rate
Seeing the true value of paid leads enabled the paid campaigns to scale uninhibited by the old model’s outdated data.
It can interfere with network learning.
When using predictive models for network optimization (as with value-based bidding), platforms like Google and Meta optimize on patterns they see across campaigns and audiences. If your predictive model assigns different values to identical users simply because they mapped to different campaigns, the network’s AI may struggle to optimize toward true high-value profiles. The result: poorer value-based bidding performance and less efficient campaigns.
We experienced this pitfall with an online education client utilizing a predictive model to feed lead values into Google for value-based bidding. The client’s model applies campaign-level offsets, with leads from the branded campaign getting assigned higher values than those with the same characteristics who searched on a non-brand keyword. While this is a helpful distinction for projecting enrollment volume, it has a negative impact on campaign volume and efficiency.
As Google’s bidding algorithm relies on aggregate learnings, we’ve seen campaign optimization lagging. The network struggles to identify high value leads amid the “inconsistent” signals it receives (since the same user would receive different values based solely on whether the search term contained a branded keyword). To address this, our team is working on developing a new model that is source agnostic so that the network can better optimize. As we collect data on close rates by campaign, we can use campaign-level targets to demand a higher return from lower quality campaigns and improve efficiency while also scaling volume.
Final Thoughts
Including source in a predictive model is tempting - it seems logical, and it can be useful when the model is used strictly for internal reporting or projections. But if the values from that model are uploaded back into ad platforms for optimization, source introduces risks that can hamper both campaign performance and network learning.
A better approach:
Use campaign targets to set the return you want from different audiences, rather than coding source into uploaded lead values.
Recalibrate frequently. Models should be updated regularly as performance shifts. Scheduling frequent reviews of model variables helps flag when values are changing, so you can keep your model aligned with reality.
Predictive models are only as good as the assumptions behind them. By being thoughtful about variables like source, you can avoid self-inflicted blind spots and make sure your models support, rather than hinder, campaign growth.


