There's a version of the B2B data story that I think most people, even people who work in this industry, have not fully reckoned with. The version goes like this. For the better part of two decades, the dominant theory of how you sell technology to a large enterprise was that you bought a list. You bought a list of accounts. You bought a list of contacts. You layered intent data over the top. And the assumption — the very deep, very rarely interrogated assumption — was that the people on that list were, in some meaningful sense, the people you needed to reach.
I want to suggest that assumption was wrong. Not slightly wrong. Wrong in a way that has quietly broken a lot of go-to-market motions, and that is especially broken right now in one specific category: companies like AWS, Snowflake, and EnterpriseDB trying to win business away from Databricks. And I think the reason it's broken there, more than anywhere else, is worth slowing down on. Because what's happening in the data cloud category is, I think, a preview of what's coming for everyone else.
The problem with knowing which company uses what
So let me put this concretely. If you are a Snowflake account executive — and I've talked to people in roughly this position — you already know which Fortune 500 accounts run Databricks. You know it. Your competitor knows it. Every vendor in the category knows it. That information used to be a competitive advantage. It is now, functionally, a commodity. The install base is public. The firmographics are public. The org charts on LinkedIn are public. What's true about the company is true on a hundred sales floors at the same time.
And here's the thing I want to draw out, because I think it gets lost. When you pull a contact list from ZoomInfo for "Data Engineer at Capital One," you get something like 800 names. Eight hundred. And neither the SDR sending the email, nor the VP of Demand Gen approving the campaign, nor honestly anyone in the building has any idea which of those 800 people has actually written a line of PySpark in the last six months. The title is a category. The category contains backend engineers, ML architects, transferred QA leads, people who took the title because it came with a raise. It's not a buying signal. It's a vibe.
Static databases tell you what was true last quarter. They don't tell you who pushed to a Spark repo yesterday. For a data cloud sale, that delta is the whole deal.
— Demand Gen lead, Cloud SaaS engagement
Now, I want to be fair to the legacy stack here, because I think it's important to say what these tools actually do well. HG Insights tells you, with real accuracy, which companies have installed which products. ZoomInfo gives you verified contact records — the email works, the phone number rings. BuyerCaddy and the intent aggregators tell you, in aggregate, which accounts are showing surge behavior on certain topics. None of that is wrong. None of it is fake. It's just that all of it operates at the wrong level of resolution for what data cloud vendors are actually trying to do.
Because the thing about a Snowflake-versus-Databricks deal — and I think this is the part the industry has been slow to internalize — is that it doesn't close top-down. It closes bottom-up, then top-down. A Director of Data Platform does not sign a seven-figure contract because a vendor sent her a clever email. She signs because three of her engineers have spent the last two months quietly benchmarking the alternative on their own time. The buyer is not one person. The buyer is a coalition. And the coalition starts with practitioners — the people whose actual work tells you whether they could ever, in principle, become a champion. If you don't know who those people are, you are selling to the wrong layer of the org.
That's the problem the customers in this report were trying to solve. Their account lists were correct. Their contact records were verified. Their reply rates were terrible. And the diagnosis, every time, was the same: they were optimizing for the wrong unit of analysis. They needed to descend from the account to the contact, and from the contact to the contact's actual technical fingerprint — the GitHub commits, the Stack Overflow tags, the Kaggle notebooks. The stuff that reveals what someone does, as opposed to what their title says they do.
A four-part architecture, and why each part has to exist
Okay, so what does the alternative look like? I want to walk through it carefully, because I think the structure matters more than any individual piece. LeadGenius builds these engagements around four data pillars, and the thing that's interesting about the architecture is not that any single pillar is novel. It's that none of them works in isolation. They have to interlock. And the reason a static database can't replicate this — even if it tried — is that it can't refresh fast enough to keep the interlocking pieces aligned.
Contact Technographics
Identify the engineers, SREs, and architects inside each target account with demonstrable skills critical to deploying the competitor's tool. The unit isn't the title — it's the GitHub commit log, the Stack Overflow tag, the Kaggle notebook.
Contact Activation
Run TOFU, MOFU, and BOFU syndication and lead-qualification campaigns against the named practitioners and influencers. Drive MQLs into the funnel with custom questions that filter signal from noise.
Buying Committees & Monitoring
Map the full buying center — practitioner, influencer, budget owner — and monitor for role changes, hiring events, and migrations that shift the deal's center of gravity.
Competitor Social Monitoring
Track which accounts the competitor's AEs are engaging with on LinkedIn each month. Spot takeout attempts in motion and intercept them before the deal closes the wrong way.
Let me put this differently. The first two pillars — technographics and activation — are where most of the lift comes from in the first ninety days. That's the part that shows up in the dashboard. But the second two, the monitoring pillars, are where the compounding happens. And I think this is the piece that's genuinely underrated. When a Director of Data Engineering changes companies, she takes a buying disposition with her. She has opinions about which tools are good. She has scar tissue from the last platform migration. The vendor who knows about her move first — who sees it the week it happens, not the quarter it shows up in ZoomInfo — has a six-week head start on the rest of the market. Six weeks, in this category, is the entire deal.
What the Databricks footprint actually looks like inside an account
I want to walk through a specific case, because I think it makes the abstract concrete in a useful way. The customer in this engagement gave LeadGenius 3,700 suspect accounts — companies where there was reason to believe Databricks was deployed, but no clear picture of how deep the deployment ran. And what came back, and I want to sit with this number for a second, is that 70 percent of those accounts matched against confirmed Databricks practitioners. Out of 190,000+ data engineers identified across the list, 9,750+ contacts had recent, demonstrable expertise across the 32 specific Databricks skills tracked. Spark. MLflow. Delta Lake. Databricks SQL. Unity Catalog. The whole toolchain.
And what that produced — and this is the move I think is genuinely new — is something I'd call a usage-density map. Not "Capital One uses Databricks." Every static database in America says that. The question is: how many practitioners, as a share of total engineering headcount, are actually touching the tool day-to-day? That ratio, it turns out, is the difference between an account that's ripe for a displacement motion and an account that's only nominally on the install base. Look at what the data shows.
There are two things to notice here, and I think they cut against intuition in interesting ways. The first is that the highest-density accounts — Capgemini, Deloitte at 4 percent — are also the hardest to win. These are organizations where Databricks is genuinely embedded in how the work gets done. Displacing it requires more than a better demo. But they're also, almost by definition, the largest prize. The second thing, and this is the more important one, is that a flat firmographic view would tell you AT&T and Deloitte are roughly comparable buyers. They have similar Databricks headcounts. They're both Fortune-anything companies. The density layer says something completely different: Deloitte has four times the concentration. You sell into the 4 percent account with a competitive-displacement motion. You sell into the 1 percent account with a nurture-and-educate motion. Same install base. Completely different playbook.
From accounts to named champions
Density tells you which accounts to attack. But density alone doesn't tell you what to do on Tuesday morning. For that, you need the contact graph — and I think this is where the work gets really interesting. Inside an account like Capital One, the CLT output didn't hand back a flat list of "data engineers." It returned three layers of named individuals, each with a confidence score derived from observable code activity and public technical signal. Let me walk through them.
Practitioners — engineers whose recent commits, Stack Overflow contributions, and Kaggle notebooks demonstrate active hands-on Databricks work. These are the people who will actually run a Snowflake or EnterpriseDB proof-of-concept. Not the people who say they will. The people who, based on the public evidence of their work, plausibly could. Influencers — staff and principal engineers, platform leads, and ML architects who oversee the toolchain and carry credibility with the team. Budget owners — VPs and Directors of Data Engineering or Data Platform whose organizational relationships to the practitioners can be inferred from public sources. Three names. Three roles. One mapped buying center per priority account. And, crucially, you know which name is which.
Turning identified contacts into pipeline
Now, identification is one thing. Activation is another. And I think there's a temptation, when you talk about data quality, to focus entirely on the discovery side — to assume that if you know who the right person is, the rest takes care of itself. It doesn't. The rest is its own discipline. In the same engagement I've been describing, somewhere between 60 and 70 percent of the suspect accounts — call it 2,000 to 2,500 of the 3,700 — had at least one ICP-fit contact who could be activated through a structured lead-qualification motion. Thirty days. US-only. Manager and up. At least one lead per company, up to five. The waterfall ran in four tiers.
Each tier represents a different depth of engagement, and I think the gradient is worth understanding because it captures something real about how technical buyers actually behave. A Market Lead downloaded the content. That's it. A Market Intel Lead downloaded and answered one custom qualifying question. A Market Qualified Intel Lead answered two. A Sales Lead downloaded, answered three custom questions, and was live-qualified and recorded with consent. So you have this kind of escalating ladder where the cost per lead goes up as you ascend, but so does the conversion probability. The waterfall lets marketing run a volume play at the top and hand sales a pipeline of pre-qualified technical buyers at the bottom — without making either team compromise on what they actually need.
And I want to draw your attention to something specific about the three role tiers, because I think it's where the genuine craft of this work shows up. The technical and engineering tier — Data Engineer, Cloud Data Engineer, Senior Data Engineer, ELT Developer, Solution Architect, DBA, Big Data Engineer — is the smallest waterfall by volume. Only 115 leads total. But it's the highest-conversion tier, because these are the hands-on-keyboard practitioners CLT surfaced in the first place. The product and leadership tier — VP of Data and Analytics, CDO, CISO, Director of Data Engineering, Head of Data Strategy — carries the budget. And the data and analytics tier — BI Analyst, Data Governance Lead, Cloud Data Operations Manager — sits between them, and is where most of the volume converts. Three different audiences. Three different message strategies. One unified waterfall.
The deeper reason this category responds to CLT
I want to step back here and try to name what's actually going on, because I think it's structural, not tactical. The reason the data cloud category responds to Contact-Level Technographics in a way that, say, legacy ERP doesn't — and this is, I think, the real argument of the report — is that these deals close in two directions at once. They are simultaneously top-down and bottom-up. The VP of Data Platform signs the contract. But she signs it because, and only because, her engineers have already validated the alternative. Either motion run on its own underperforms. The motions have to compound. And what CLT enables — really, the whole reason it exists — is the ability to run both motions, with the same data asset, against the same target list, at the same time.
Personalized display to a tightly-scoped audience of VPs and Directors. Physical gifting. Thought leadership and panelist opportunities. Matched-audience syndication with MOFU and BOFU custom questions. Any direct outreach conducted by equivalent-level executives — extra points if those execs are technical themselves.
Personalized display playing off known skills, driving into Data Cloud trials. Feed the ML lead-scoring model with measured self-serve signups and credit consumption. Run cheaper CPL TOFU syndication to measure contact-level intent through download frequency. The goal isn't a meeting yet — it's evidence of evaluation behavior the AE can pick up later.
Both motions converge, eventually, on the same set of outcomes — meetings, MQLs, and, most importantly for AWS, Snowflake, and EnterpriseDB, Data Cloud trials. And I want to spend a moment on that last category, because I think it's the most undervalued signal in this entire architecture. When a practitioner whose CLT profile already shows deep competitive-tool expertise signs up for your free trial, what you have is not a marketing lead. What you have is a buyer who has chosen, on their own time, without anyone asking, to begin evaluating you. That is — I don't know how else to say it — one of the strongest pieces of intent data that exists in B2B. And the vendor who knows about it first wins.
The Gmail problem, and what CLT does with it
Here's a problem the legacy stack genuinely can't solve, and I think it's worth understanding why, because it tells you something about the limits of the old architecture. Practitioners sign up for trials with personal Gmail addresses. Sometimes through a GitHub-authenticated flow. Sometimes through a content download form they filled out at 11pm. Static databases see those signups and discard them. They look unqualified — there's no work domain, no firmographic context, no way to route them. Marketing throws them away.
But the same data asset that mapped the practitioner inside the target account — the asset that knows Maria has been committing to Spark repos at Company X — can match the anonymous Gmail signup back to a verified work identity, a company domain, and an existing account in the CRM. What was an unqualified lead becomes a named, contextualized buyer inside a priority account. And the bottom-up motion, which used to leak the warmest leads out the back of the funnel, suddenly retains them. I want to be careful not to oversell this — it doesn't work every time, and it doesn't resolve every signup. But when it does work, it converts what was noise into the warmest signal in the funnel. Because by definition, you already know that person belongs to a target account, and you know what they've been building with.
The bottom-up signups stopped being noise. They started being the warmest leads in the funnel — because we already knew exactly which target account they belonged to, and what they'd been building.
— RevOps lead, Data Cloud engagement



