Most AI sales case studies are fake. “10x pipeline!” with no details. “Doubled meetings!” without mentioning they were junk meetings.

This one is real. An enterprise e-commerce platform — B2B, selling to online merchants — had 6 SDRs splitting time between inbound qualification and outbound prospecting. We automated the qualification half. Five of those six moved to full-time outbound. One stayed to handle the edge cases.
No one was fired. Revenue capacity went up. Here’s exactly what happened and why.
This company sells an enterprise e-commerce platform. Their buyers are online merchants — some running €50K shops, some running €50M operations. The qualification question isn’t “are they interested?” (they filled out a form, so yes). The question is: “are they big enough and complex enough to need an enterprise platform?”
That question takes about 20 minutes to answer. Not because anyone is slow — because the research is genuinely deep.
Before we came in, each of their 6 SDRs spent roughly half their week qualifying inbound leads. The other half was cold outbound. But here’s the thing most people don’t talk about: they didn’t have a structured scoring system. Every BDR qualified leads their own way. Some checked the website thoroughly, some relied on gut feel from the form submission, some spent 10 minutes, some spent 30. The criteria lived in people’s heads, not in a document.
The typical research flow — when someone was being thorough — looked like this:
Step 1: Check the basics (2 minutes) Name, company, role, what they said in the intake form. Basic CRM hygiene.
Step 2: Research the company (10 minutes) Visit their website. Check their current tech stack — what platform are they on today? What plugins, integrations, payment providers? Some of this is visible in the HTML source. Their SDRs literally inspected page source code to understand the prospect’s e-commerce setup.
Step 3: Estimate fit and size (5-8 minutes) Cross-reference with whatever data they could find. How much traffic does the site get? What’s the estimated revenue? How many SKUs? Are they B2B, B2C, or both? Are they in a region where the platform has strong support?
Step 4: Make a judgment call No formal rubric. Each BDR had their own mental model of what “good” looked like. This meant two BDRs could look at the same lead and reach different conclusions — and neither would be wrong, exactly, because there was no agreed-upon standard.
The total inbound volume was roughly 30-35 qualifications per week — not evenly distributed, so on any given day one or two BDRs would handle whatever came in while the others focused on outbound. The individual quality was decent — these were experienced people who knew the product. But inconsistency was the real problem. The best BDR’s “medium priority” was another BDR’s “high priority.” Leads were slipping through the cracks not because of negligence, but because there was no shared definition of what a good lead looked like.
We didn’t drop in a chatbot or a scoring spreadsheet. We built an AI research agent that does what a thorough BDR does — but does it the same way every time, querying 6+ data sources per lead, in under 3 minutes.

But before we could automate anything, we had to solve the harder problem first: there was no process to automate.
Phase 1: Building the scoring rubric that didn’t exist. We spent the first week interviewing their BDRs and reverse-engineering how each one actually qualified leads. What did they check? What mattered most? What made them say “this one’s a pass” vs. “send it to an AE”? The answers varied — which was the whole problem.
From those interviews, we designed a 155-point scoring rubric across three dimensions: Company Fit (headcount, industry, business model, revenue), E-Commerce Readiness (catalog size, monthly traffic, countries served, average order value), and Technical Fit (what platform they’re currently on and how likely they are to migrate). Each criterion has weighted tiers — not binary yes/no, but graduated scores that reflect real-world nuance.
Then we iterated. We ran the rubric past their team, scored 20 leads together, argued about the weights, adjusted. Two rounds of calibration before anyone agreed on the criteria. This phase took about a week — and it was the most valuable part of the engagement. For the first time, the team had a shared, written definition of what “qualified” actually means.
Phase 2: The intake trigger. When a prospect fills out the website form, a record is created in HubSpot. That creation event triggers our agent automatically.
Phase 3: The deep research layer. This is where most “AI SDR” tools fall short. They check a database, match some firmographics, and call it qualification. Our agent queries 6 distinct data sources per lead: company profile databases for headcount and history, tech stack detection services that identify their current e-commerce platform from HTML analysis, traffic estimation tools, domain intelligence for product catalog size and estimated sales, web search for revenue signals (3 queries, top 3 results each analyzed), and a website scraper that reads the homepage for business model clues.
On top of that, 8 specialized AI extraction agents each focus on one dimension — headcount, industry classification, business model, catalog size, countries served, current platform, annual revenue, and website status. Each agent is purpose-built and independently validated.
This isn’t a lookup. It’s 8-10 external API calls, multiple web crawls, and 8 focused AI analyses — per lead.
Phase 4: The scoring engine. The agent applies the 155-point rubric we co-designed with their team. Three tiers: High (70+), Medium (40-70), Low (below 40). But it does something humans struggle with: it’s perfectly consistent. The third qualification on a Friday afternoon gets the same rigor as the first one on Monday morning. Every field is checked. Every threshold is applied exactly. No more “depends which BDR you get.”
Phase 5: The routing. Based on the score, the agent either drafts a personalized email for the Account Executive to review and send, adds the lead to a nurture sequence, or flags it as not a fit with a written explanation of why. The AE never sees a lead without context — they get a detailed research brief covering tech stack, estimated revenue, traffic, catalog complexity, and specific talking points for the first call. All written back to HubSpot automatically.
Before: 6 SDRs, each spending ~50% on qualification, ~50% on outbound. Roughly 25-30 qualifications per week.
After: 1 BDR handling edge cases and quality checks. The same 25-30+ qualifications per week, running automatically. The other 5 SDRs moved to full-time outbound prospecting.
The math that matters:
The company didn’t save headcount costs — nobody was laid off. Instead, they unlocked capacity. Five people who were spending half their time on repetitive research suddenly had 100% of their time for outbound. That’s 2.5 FTEs of net new outbound capacity, created without hiring anyone.
Denis Kalyshkin from I2BF Global Ventures, who interviewed us on his podcast about this approach, pushed back: “So there’s no cost savings?” Correct. There are no cost savings in the headcount line. The value is in what those people do with their freed-up time.
This is a distinction most AI sales pitches get wrong. They promise “replace your SDRs.” The reality — at least for enterprise B2B — is “free your SDRs to do what humans are actually good at.”
Humans are good at building relationships. They’re good at reading between the lines on a discovery call. They’re good at knowing when a prospect says “we’re happy with our current solution” but their tone says “convince me.” AI is not good at any of that. AI is good at querying 6 data sources, reading HTML source code, running 8 specialized analyses, and applying a 155-point scoring rubric consistently at 2 AM on a Saturday.
We’ve now run this pattern across multiple customers in different industries — e-commerce, immigration law, SaaS. A pattern has emerged: the ROI of AI qualification scales with the complexity of the research step.
If your qualification is “check if they’re in our target industry and have more than 50 employees” — that takes 5 minutes, and AI saves you 5 minutes. Helpful, but not transformative.
If your qualification requires visiting their website, checking their tech stack, cross-referencing with traffic and financial data, and applying a multi-factor scoring rubric — that takes 15-30 minutes. AI saves you 15-30 minutes per lead. At 30 leads per week, that’s 8-15 hours of SDR time. That’s when you start moving people off the process entirely.
The threshold we’ve observed: if your qualification process takes more than 15 minutes per lead and requires multiple data sources, AI automation pays for itself in the first month. Below that, it’s still useful but the economics are less dramatic.
The phrase “AI SDR” has become meaningless. Most products with that label do one of three things:
1. Database lookup + template email. They match your lead against a firmographic database and send a pre-written sequence. This works for outbound at scale but fails completely for qualification because it can’t assess fit beyond surface-level attributes.
2. Chatbot on your website. Engages visitors in real-time, asks qualifying questions, routes to sales. Useful for a different problem (inbound chat), but doesn’t help when the lead already submitted a form and you need deep asynchronous research.
3. LLM wrapper around email. Uses GPT to write personalized emails. The personalization is shallow — “I noticed you’re based in Berlin and work in e-commerce” — because the underlying research layer is just a database lookup dressed up with natural language.
None of these could have handled this qualification workflow. The question at this enterprise wasn’t “does the lead match 3 firmographic criteria?” It was “analyze their website architecture, estimate their transaction volume from technical signals, assess their tech stack compatibility, and predict whether they’re sophisticated enough to need an enterprise platform.”
That requires an agent — not a lookup tool, not a chatbot, not an email writer. An agent that can browse the web, read HTML, query APIs, synthesize information from multiple sources, and apply a nuanced scoring rubric.
After running this system for several months and processing hundreds of qualifications, here’s what we’ve learned:
Lesson 1: Building the scoring rubric is the real work. The company didn’t have a formal scoring system — every BDR qualified differently. We had to interview the team, extract their implicit criteria, design a structured rubric, and iterate together until everyone agreed on the weights. Within two weeks of running the rubric, the team realized some criteria needed adjustment — not because anyone was wrong, but because formalizing an intuitive process reveals where gut feel was compensating for missing data. The rubric is now on its third version. It keeps evolving.
Lesson 2: Edge cases need a human escape valve. About 10-15% of leads don’t fit neatly into the scoring rubric. They’re too big (enterprise deals that need custom scoping), too unusual (non-standard use cases), or too ambiguous (signals pointing in both directions). The one remaining BDR handles these. Trying to automate 100% of qualifications is a mistake — the last 15% is where human judgment actually matters.
Lesson 3: The AEs loved it more than the SDRs. Surprising finding: the Account Executives were more enthusiastic about the system than the SDRs it freed up. Why? Because every lead now came with a detailed research brief. Before, they’d sometimes get a lead with just a name and “looks good” from a busy SDR. Now they get a full analysis — tech stack, estimated size, competitive position, specific talking points for the first call. Their close rate on qualified leads went up because they were better prepared.
Lesson 4: Speed is its own qualification signal. When the research takes 20 minutes and someone has to get around to it, leads wait hours or days for a response. When it takes 3 minutes and runs automatically, they get a response while they’re still thinking about you. Multiple prospects have mentioned that the fast, detailed response influenced their perception of the company. Speed doesn’t just improve conversion — it changes how prospects perceive you before the first conversation even starts.
This case study is from enterprise e-commerce, but the pattern works anywhere the qualification research is deep. We’ve seen similar results in:
- Immigration law — qualifying visa applicants requires checking Google Scholar for publications, patent databases, press mentions, LinkedIn for work history, and applying multi-criteria legal tests. 40+ minutes per applicant manually.
- SaaS selling to SMBs — qualifying inbound from a high-volume funnel where each lead needs company size verification, tech stack assessment, and fit scoring against a multi-factor ICP.
- Professional services — qualifying RFP responses where you need to research the requesting company’s existing vendor relationships, budget signals, and decision-making timeline.
The qualifying questions: 1. Does your qualification process take more than 15 minutes per lead? 2. Does it require visiting external websites or querying multiple data sources? 3. Is your scoring inconsistent across team members — or does no formal scoring exist at all? 4. Are your SDRs spending more than 30% of their time on qualification instead of selling?
If you answered yes to 3 or more: you’re a candidate for this approach.
Did anyone get fired?
No. All five SDRs moved to full-time outbound prospecting. The company gained 2.5 FTEs of outbound capacity without hiring. In practice, this is the more common outcome for enterprise teams — you’re not eliminating roles, you’re reallocating from research to revenue-generating activities.
How accurate is the AI scoring compared to the human BDRs?
After the initial calibration period (about 2 weeks of co-designing and tuning the rubric with the team), the AI scores agree with human judgment roughly 85-90% of the time. The remaining 10-15% are edge cases that the BDR handles. The key difference: the AI applies the same 155-point rubric identically every time. No Friday afternoon shortcuts, no inconsistency between team members.
How long did it take to set up?
About 2-3 weeks. The first week was the hardest — they didn’t have a formal scoring system, so we spent it interviewing BDRs, designing the 155-point rubric from scratch, and iterating with the team until everyone agreed on the criteria and weights. Week two was building the research agent and connecting the HubSpot trigger. Week three was calibration — running real leads through both the human and AI process in parallel and adjusting.
What happens when the prospect’s website changes?
The research is done at the time of qualification, and the results are stored. If a prospect comes back 6 months later, they go through the process again with fresh data. The agent always works with current information — it doesn’t rely on cached database records that might be months old.
Can this work for outbound prospecting too?
Different problem. Outbound requires identifying who to reach out to (targeting), crafting personalized messages (copywriting), and managing multi-touch sequences (orchestration). The qualification automation we built here is for processing inbound leads that have already expressed interest. We handle outbound differently — see our guide on automating outbound sales with AI.
What’s the cost compared to the SDR time it replaced?
We can’t share exact pricing from this engagement, but the general principle: AI qualification costs 5-10% of the equivalent human time. If a qualification takes 20 minutes of SDR time at $30/hour, that’s $10 per lead. AI does it for $1-2 per lead. At 30 leads per week, that’s $240-270/week saved on the qualification step alone — before you count the value of reallocated SDR time on outbound.
This case study is based on a real customer engagement. Company details are anonymized at their request. The podcast conversation with Denis Kalyshkin where we discussed this case is available on YouTube (Russian). Want to see if this approach fits your team? Talk to us.