Picture the meeting. Someone from marketing pulled up the slide. "We generated 847 MQLs this quarter." Sales nodded politely; the way people do when they've already decided not to believe something. "How many converted?" Pause. "We may need to revisit the model."
The model doesn't get revisited.
Here is what that number actually represents: 87% of MQLs never become sales opportunities. The model has been running for years. The conversion rate has been roughly the same. What changes, quarter to quarter, is how confidently the deck gets presented.
There are two structural reasons the MQL model is broken and neither of them is "you configured it wrong." The weights going in are guesswork. The number coming out is a headcount limit. This piece breaks down both, and explains why no amount of tuning fixes a model built on those foundations.
A Quick Anatomy of How MQL Scores Get Built
Lead scoring isn't complicated in theory. Marketing assigns point values to actions: a pricing page visit might be worth 20 points, a webinar attendance worth 15, an email click worth 5. When a lead crosses a set threshold (say, 70 points) they get routed to sales as an MQL.
Clean. Logical. Completely reasonable-looking on a slide.
The problem is what happens in the room where those numbers get decided.
Problem 1: The Points Are Made Up
The scoring meeting nobody talks about
Here's what that meeting looks like.
Someone opens a blank spreadsheet. Someone else suggests that a pricing page visit should be worth more than a blog post. A third person mentions they read that webinar attendance is a strong intent signal. Agreement happens. Numbers get typed in. The model goes live.
RevOps practitioner Jeff Ignacio describes what comes next:
"Somebody built the model eighteen months ago. They assigned points based on a mix of intuition, sales feedback, and whatever the marketing team believed about their ideal customer profile at the time. The scores went live, MQL thresholds were set, and then everyone moved on to the next fire."
That's not a cautionary tale about one bad team. That's the modal experience.
No closed-won data was analyzed. No correlation between these actions and actual revenue was validated. The weights are the team's collective opinion about what should matter — dressed up in a spreadsheet that looks like math.
Industry benchmarks make this worse, not better. Borrowing scoring weights from vendor blog posts or "best practice" guides means applying values calibrated to someone else's buyer population, product category, and sales cycle. A webinar might be a genuine buying signal for one company and a content-consumption habit for another. The model doesn't know the difference. It just adds points.
The result is manufactured precision. Here's what it looks like in practice:
- Buyer A: 3 email clicks (+15) + 1 webinar attendance (+15) + 1 pricing page visit (+20) + 1 blog post (+10) = 60 points → not an MQL
- Buyer B: 4 email clicks (+20) + 2 webinar attendances (+30) + 1 blog post (+10) = 60 points → MQL if they also open one more email
Both buyers did roughly the same things. Neither of them told you what they were trying to solve, whether they had a budget, or what would make them walk away.
More data-mature teams reverse-engineer scoring weights from closed-won data. But most don't and even those that do still face the staleness problem and a threshold that was set by a headcount constraint, not a buyer readiness signal.
When the signal doesn't match the outcome
Here's the practical consequence. Webinar-sourced leads, which most models score generously, convert to opportunities at just 17.8% on average. Event leads convert at 4.2%. Email campaign leads, which accumulate points for every click, convert at 0.9%.
The model is adding points to actions that don't predict buying. Nobody audits this, because the model already went live and the threshold got set and the meeting moved on.
Asking a lot of questions looks like buying intent. It isn't, necessarily. Docket's conversion data shows that prospects who don't convert actually ask more questions than those who do. Curiosity and commitment aren't the same thing. A scoring model can't tell the difference — it just sees activity and adds points.
The model that stopped learning the day you built it
Rule-based scoring has one more flaw that rarely gets its own slide: it reflects buyer behavior at the moment it was built, and then it stops.
ICP shifts. Products expand. Competitive dynamics change. The personas who bought 18 months ago may not look like the ones buying today. None of this updates the model automatically. Someone has to go back in and rebuild the weights.
Most teams don't. Best practice guidance recommends reviewing scoring models every three to six months. The fact that this has to be recommended implies the obvious — most teams set the model and leave it. The score your sales team is trusting today may be calibrated to win patterns from two or three years ago.
ZoomInfo puts it plainly: rule-based scoring "requires ongoing manual refinement." In most organizations, that refinement is perpetually on the roadmap and never quite on the calendar.
Problem 2: The Threshold Is a Capacity Decision
Where the number actually comes from
Let's say the scoring weights are sorted. The harder question is: where does 70 come from?
When a company decides "a lead needs to score 70+ points to be an MQL," that number almost never comes from studying which buyers actually close. It comes from a simpler question: how many leads can our SDR team realistically call this month?
Say your SDR team can handle 350 leads a month. You run the numbers:
- At a threshold of 60 → your model produces 500 MQLs/month
- At a threshold of 70 → your model produces 350 MQLs/month
So you set it at 70. Not because 70-point leads are meaningfully more ready to buy than 65-point leads. But because 350 is the number your team can physically handle.
The threshold is doing workload management, not quality filtering.
This is the tell: the threshold moves when your team changes, not when your buyers change.
- Hire more SDRs → threshold drops (more capacity, let more through)
- Downsize the team → threshold goes up (less capacity, filter more out)
The buyers behaved exactly the same way throughout. Nothing about the quality of leads changed. Only the team's bandwidth did. If the threshold were genuinely measuring buyer readiness, it wouldn't need to shift every time someone leaves or joins the SDR team.
Only 27% of marketing-generated leads ever get contacted at all. That's not a quality problem — it's a volume management problem. The threshold was set to protect the team's calendar, and even then, three-quarters of the leads it lets through never get a call.
A quick objection worth addressing: some teams respond to this by pushing for faster follow-up — better SLAs, more SDR capacity, tighter routing. That's a reasonable operational fix. But speed of follow-up doesn't change what the score actually measured. It just means someone calls the approximate number more quickly.
The incentive this creates for marketing
Once marketing knows the threshold, the game changes.
The objective shifts from "attract better-fit buyers" to "generate enough activity to cross the line." Campaigns get optimized to produce scoring volume. Webinars get promoted not because they qualify buyers, but because attendance adds 15 points. Email sequences get designed to accumulate clicks.
The model creates exactly the behavior it was supposed to filter. Forty-three percent of sales professionals say they need higher-quality leads from marketing. That number has stayed roughly the same for years. The models have been running the whole time.
What real signal actually looks like
Docket's Conversion Patterns Report, drawn from 4,736 real buyer conversations, found the widest behavioral gap in the dataset here: in conversations that end with a qualified email capture, 91% include a concrete next step. In conversations that don't convert, that number drops to 13%.
The difference isn't how much activity happened before the conversation. It's whether the conversation produced forward motion. That's something a scoring model cannot measure by design. It counts events. It doesn't understand what happened inside them.
The same dataset found that the deepest 12% of conversations — those lasting five minutes or longer — generate 30% of all captured pipeline. Volume and quality don't move together. A scoring model that tries to sort the top-of-funnel by volume of accumulated activity is fishing in the wrong dimension.
What Real Qualification Actually Requires
Qualification isn't a score. It's a conclusion drawn from a conversation.
The structural difference matters. A scoring model measures events — clicks, visits, downloads — and infers intent from their accumulation. A conversation-based qualification mechanism asks and adapts. It can surface what the buyer is actually trying to solve, identify the constraints that will determine whether a deal closes, and handle the objection that would otherwise send them quietly to a competitor. Those are fundamentally different operations. Tuning the scoring model doesn't close the gap between them.
Claravine, an enterprise data governance company, found this out precisely. Once their AI Marketing Agent started qualifying buyers through real conversations, AEM integration emerged as their strongest buying signal. Not a page visit, not a webinar, but a specific product integration that surfaced repeatedly in the questions serious buyers asked. Their scoring model would never have found that. It would have assigned the same generic points to every pricing page visit, regardless of what the visitor actually wanted to know. The result: a 5.6x above-baseline meeting book rate, and visibility into where prospects were stalling in the funnel that no other analytics tool could surface.
Factors.ai found that 77% of their qualified meetings were booked outside business hours — pipeline that would have evaporated on a form and gone unscored entirely. The threshold wouldn't have caught them that night. Most of them wouldn't have come back.
This is the practical definition of the shift: from an MQL, which gives a rep a name, a score, and a list of pages visited — to an Agent-Qualified Lead (AQL) which gives a rep a context card. Use case confirmed. Constraints surfaced. Objections identified. Next step explicitly requested by the buyer.
As Arjun Pillai framed it: it's like receiving a dossier, not just a business card.
An MQL tells you a lead crossed a threshold. An AQL tells you a buyer is ready to talk — and your rep already knows why.
The Fix Isn't a Better Spreadsheet
The MQL model isn't broken because someone set the weights wrong. It's broken because the math was always approximate, and the threshold was always about capacity. Both are structural, not configurational.
Most demand gen leaders already know this. The model persists not because it works, but because it's the thing everyone agreed on — and changing it requires a harder conversation than optimizing it. The CFO built budget models around MQL volume. The CRM is wired to it. The QBR slide is ready to go.
But the cost of not changing shows up in the same place, every quarter: a pipeline that looks like it's working until sales opens the queue.
A more useful definition of qualification — one that holds up under scrutiny — is this: a lead is qualified when you've reduced uncertainty enough for the buyer to take the next step. That's not a score. It's an outcome. And it requires a conversation to produce.
See how Docket's AI Marketing Agent qualifies buyers in real conversation — and hands your rep a context card, not a contact name. Book a demo → https://docket.io

