Most teams that try to automate lead qualification start with the prompt.
They sit down, open ChatGPT or their agent framework of choice, and start writing: “You are a lead qualification specialist. When a lead comes in, check if they match the following criteria…”
This almost always fails. Not because the AI can’t do the job — but because the prompt describes what the team thinks the process is, not what actually happens.
Vercel took a different approach. Before writing a single line of code, they put an engineer next to their best SDR for several weeks. Just watching. Taking notes. Asking “why did you do that?” after every decision.
Six weeks later, they went from 10 SDRs on inbound qualification to 1 — with the same conversion rate.
Here’s why shadowing works, and how to do it yourself.
Every sales team has a documented qualification process. It usually lives in a Notion page or a Google Doc that nobody’s updated in 6 months. It says things like “check company size” and “verify budget authority.”
But when you sit next to the person who’s actually qualifying leads every day, you see something different.
You see them glance at a LinkedIn profile and immediately skip the lead — not because of any documented criteria, but because the profile picture is a logo, which usually means it’s a marketing account. You see them check the email domain and make a split-second judgment about company size before even looking at the company page. You see them read a demo request form and know from the phrasing alone whether this person is a decision-maker or a researcher.
These are the decisions that matter. And they’re invisible until you watch them happen.
A European B2B company we worked with had 6 people spending half their time on lead qualification. When we asked them to describe their process, they gave us a clean 5-step workflow. When we actually shadowed their top qualifier, we counted 23 distinct micro-decisions — most of which weren’t in any documentation.

The approach that worked at Vercel — and that we’ve since replicated with multiple clients — follows a specific pattern:
Not your most senior person. Not your manager. Your best individual contributor at the specific task you’re automating.
This matters more than people think. Managers often describe an idealized process. The IC does the actual work and has developed shortcuts and heuristics that make them fast and accurate.
At Vercel, this was one specific SDR who consistently had the highest qualification accuracy — meaning the leads they passed to AEs converted at the highest rate.
An engineer (or whoever will build the automation) sits with the SDR for 1-2 weeks. Not interviewing them — watching them. There’s a critical difference.
When you interview someone about their process, they give you the rational, post-hoc explanation. “I check the company size, then the industry, then the role.” Clean and logical.
When you watch them, you see: they actually check the email domain first, then scan the form message for specific phrases (“looking to scale,” “evaluating tools,” “our CEO wants”), then check company size — but only if the message didn’t already give them enough signal.
The order matters. The shortcuts matter. The things they skip matter.
After each lead, the engineer asks: “What made you decide X?”
This surfaces the implicit knowledge. Things like:
• “If they mention a specific competitor by name, they’re usually serious — they’ve already done research.”
• “Personal email addresses for B2B are almost always students or consultants. I skip them unless the message is really specific.”
• “When someone says ‘just exploring,’ they’re 6-12 months out. I still qualify them, but I flag them as nurture.”
These “why” answers become the scoring rules for your AI agent.
Take all those observations and structure them as if-then logic:
IF email domain is personal (gmail, yahoo, hotmail)
AND message does not mention company name
→ DISQUALIFY (likely student/consultant)
IF message mentions specific competitor
→ +15 intent points (active buyer)
IF message says "exploring" or "just researching"
→ Set timeline to 6+ months
→ Route to nurture, not immediate follow-up
This isn’t production code — it’s the decision logic that will inform your prompts and scoring model.
Now — and only now — you build the automation. Using the decision trees from Step 4, you create a scoring model and an AI agent that follows the same logic your best SDR uses.
Then you test it. Run both in parallel for 2-4 weeks: the AI qualifies every lead, and the SDR qualifies the same leads independently. Compare results.
At Vercel, the AI matched the SDR’s conversion rate within the first month. The remaining gap closed as they fed disagreements back into the model.

Across every shadowing engagement we’ve done, the same three-dimensional scoring framework keeps emerging. Different companies weight the dimensions differently, but the structure is universal:
This is the ICP check. Company size, industry, geography, buyer role. The SDR’s version of “does this look like someone we sell to?”
But shadowing reveals nuances that ICP documents miss: - One client discovered their best customers always had a VP of Revenue Ops — not just VP of Sales - Another found that Series B companies converted 3x better than Series A, despite both being “in ICP” - A third realized that companies using a specific CRM were 4x more likely to close, because their product integrated with it
These are the “amplifier” signals that only emerge from watching real qualification happen.
The strongest signal from shadowing wasn’t company data — it was the language in the demo request form.
SDRs unconsciously parse request messages for buying signals:
“Looking to replace [competitor]” — +15 — Active evaluation, budget exists
“Demo for my team” — +12 — Multi-stakeholder, serious evaluation
“Need to solve [specific problem]” — +10 — Clear pain, not just browsing
“My boss asked me to look into this” — +8 — Delegated research, real need
“Just curious” or “exploring options” — +3 — Low intent, long timeline
Pricing page visited 3+ times — +10 — Behavioral signal, strong interest
The AI can process these signals faster than any human — and it never forgets to check.
Timing signals are the hardest to automate because they often come from external data:
• Recent funding round (budget unlocked)
• Q1/Q4 budget planning cycles
• New leadership hire (new initiatives)
• Job postings for relevant roles (growing the function)
• Competitor mentioned in the news (market awareness)
The best SDRs check these signals manually. An AI agent with the right tools (LinkedIn data, Crunchbase, job board APIs) can check them instantly for every lead.
One of the biggest mistakes we see: teams try to go from fully manual to fully automated in one step.
Vercel didn’t do this. They followed a gradient:
Week 1-2: AI qualifies, human reviews every decision. Catch rate: ~70% agreement.
Week 3-4: AI qualifies, human reviews only disagreements. Catch rate: ~85% agreement.
Week 5-6: AI qualifies autonomously, human reviews random samples + edge cases. Catch rate: ~95% agreement.
Week 6+: AI qualifies autonomously. One SDR stays as a QA role, reviewing weekly samples and flagging drift.
The key insight: the SDR who stayed wasn’t doing qualification anymore. They were teaching the system — reviewing its decisions, identifying patterns it missed, and feeding corrections back into the model.
This is the same gradient we saw with the European client: 6 people went from spending half their time on qualification to spending zero time on it. All 6 now do outbound exclusively.

The system doesn’t stay static after launch. Every correction becomes training data:
1. AI qualifies a lead as “Hot” (score 82)
2. SDR disagrees — marks it “Warm” (should be 65)
3. System logs the disagreement with reasons
4. After 50+ disagreements accumulate, a reasoning model analyzes the patterns
5. It proposes prompt adjustments (e.g., “reduce intent score for leads from agencies”)
6. Changes deploy to a small percentage of leads first
7. If approval rate improves, they roll out to 100%
This is Level 4 on the autonomy ladder — the system doesn’t just execute, it self-improves. The SDR who used to qualify leads has become the feedback loop that makes the AI better over time.
Let’s break down the economics, because they’re dramatic:
10 SDRs on inbound qualification — 1 SDR (QA role)
$800K-$1.2M/year in SDR salaries — ~$1,000/year in AI costs
Average response time: 20-40 min — Average response time: ~2 min
Coverage: business hours only — Coverage: 24/7/365
6 months to ramp a new SDR — 6 weeks to build the system
The 9 SDRs who moved to outbound didn’t lose their jobs — they moved to higher-value work where human judgment, creativity, and relationship-building actually matter.
This is the pattern we see everywhere. Inbound lead qualification is one of the clearest ROI cases for AI in sales — not because the AI is smarter than your SDRs, but because the task is structured enough that a well-built system can match their performance at a fraction of the cost and 10x the speed.
Not every shadowing engagement ends with automation. Sometimes it reveals the opposite.
We’ve had cases where shadowing showed that the qualification process was actually more nuanced than anyone realized — involving judgment calls about market timing, competitive dynamics, and relationship history that current AI can’t reliably handle.
The honest answer: if your best SDR makes decisions that they can’t explain logically (pure gut feel that consistently works), that process probably isn’t ready for automation yet. Come back in 6 months when the models are better.
But for most inbound qualification? The decisions are structured. The data is available. The logic is explainable. It’s the lowest-hanging fruit in sales automation.
You don’t need 6 weeks or a dedicated engineer. Start with a lightweight version:
1. This week: Ask your top SDR to narrate their next 10 qualifications out loud. Record it (with their permission). Write down every decision that isn’t in your documented process.
2. Next week: Turn those decisions into scoring criteria. Weight them based on how often they change the outcome.
3. Week 3: Build a simple scoring model — even a spreadsheet works. Run it against your last 50 leads and compare to actual outcomes.
4. Week 4: If the model’s accuracy is above 80%, you have enough signal to build an AI qualifier. If it’s below 60%, you need more shadowing — you’re missing key decision criteria.
The companies that automate qualification fastest aren’t the ones with the best AI tools. They’re the ones who take the time to understand what their best people actually do — before trying to replicate it.
This article is based on real deployment patterns across multiple teams. The Vercel case is public (shared by their COO). Other examples are anonymized. Want to automate your lead qualification? Try Onsa.ai — our AI agents follow the same shadow-then-automate methodology.
How long does the shadowing phase typically take?
1-2 weeks of active observation, plus another week to document and structure the decision trees. Most teams try to skip this and regret it. The 2-3 weeks of shadowing saves months of iteration later.
What if our SDRs qualify leads differently from each other?
That’s actually common — and it’s one of the biggest reasons to shadow your best performer specifically. Inconsistency in qualification is exactly what automation fixes. You standardize on the approach that produces the best conversion rates.
Can this work if we don’t have 10 SDRs?
Absolutely. Even a 2-person team benefits. If your founder or first SDR is spending 30% of their time on inbound qualification, automating that frees up a third of their capacity for outbound, product feedback, or strategic work. The economics scale down proportionally.
What tools do I need?
At minimum: an LLM API, access to your CRM data, and an enrichment source (LinkedIn via Sales Navigator or API). Tools like Onsa.ai bundle these together with pre-built qualification workflows. The “shadow methodology” applies regardless of which tools you use — it’s about understanding the process before building.
How does this differ from traditional lead scoring in my CRM?
Traditional lead scoring uses static rules (company size + job title + page visits = score). AI qualification uses the full context — reading the actual message, researching the company, checking timing signals, and making nuanced judgments like “this person mentions a specific competitor, so they’re in active evaluation.” It’s the difference between a form and a conversation.