Solving business identity resolution from incomplete data.

A phone number, a website, a DBA, an address — and a question that sounds simple until you try to answer it at scale. Here is why business identity resolution is harder than it looks, and how LeadGenius approaches it.

Guide
June 3, 2026

A recurring question in revenue ops, sales engineering, and data engineering communities goes something like this: "I have a phone number, a website, a DBA, or an address, and I need to know what company is actually behind it. What's the best way to do this?"

It sounds like a niche frustration, but anyone who has worked on lead routing, CRM hygiene, KYC/KYB, or fraud and risk has hit the same wall. The inputs are messy, the entities are slippery, and the "ground truth" is fragmented across hundreds of independent sources: every state's Secretary of State, every country's company registry, every Google Maps listing, every domain WHOIS record, every social profile.

This post walks through why business identity resolution is hard, the moves that actually work, and how LeadGenius approaches it.

01 · FoundationsWhy this problem is so much harder than people expect

Person-level identity resolution is well understood. A name plus a few attributes (email, address, phone, DOB) narrows quickly to a single human. Companies don't behave that way.

A single operating business can be:

  • A registered legal entity (LLC, Inc., GmbH, Ltd., Pty.)
  • Doing business under one or more DBAs that are not the legal name
  • Registered in multiple jurisdictions as separate legal entities ("foreign corporations" or branches)
  • Owned by a holding company that sits between it and the ultimate parent
  • Sharing a phone number, address, or website with a sibling entity in the same corporate family
  • Renamed multiple times across its life, sometimes with the same company number, sometimes with a brand-new one

So when someone hands you "415-555-0142, acmewidgets.com, Acme Widgets" and asks who is this?, the honest answer is that it might be any of three or four different things depending on what you need the answer for. Lead routing wants the operating brand. KYB wants the registered legal entity. Account hierarchy wants the ultimate parent.

The question itself is underspecified. The source data is fragmented. And the changes are what quietly defeat most internal tooling. On why entity resolution rots in production

That's the first reason this is hard: the question itself is underspecified. The second is that the source data is fragmented across jurisdictions, formats, and update cadences. The third (the one that quietly defeats most internal tooling) is that the changes matter. Companies dissolve. They get reinstated. They change names. They get acquired. A snapshot from six months ago will quietly route deals to entities that no longer exist.

02 · Reality checkThe inputs you actually have

In practice, the inputs that show up at the front of an entity resolution problem tend to be some combination of:

  • A website or domain, often without a www, sometimes with a redirect chain
  • A phone number, often a main line shared across departments or franchise locations
  • A DBA or trade name that doesn't match anything in any registry
  • A postal address, which might be a registered agent, a virtual office, a co-working space, or a real HQ
  • An officer or contact name: a person, often without a clear role
  • A legal-name fragment: "Acme Widgets" when the registered name is "Acme Widgets Holdings, LLC"

The hardest records to match (the ones that break most tooling) share a few signatures: small private companies with common names, recently renamed entities, foreign branches of multinationals, sole proprietors operating under a DBA, and franchisees whose phone and address point to the corporate parent rather than the operating location.

03 · MethodWhat actually works

There is no single magic input. What works is a layered approach where each layer narrows the candidate set and raises confidence:

  1. Normalize aggressively before you match. Strip legal suffixes, punctuation, casing, and common abbreviations. "Acme Widgets, LLC" and "ACME WIDGETS L.L.C." and "Acme Widgets Limited Liability Company" need to collapse to the same key before any join happens. This single step recovers more matches than any clever algorithm downstream.
  2. Resolve to a registered legal entity first, then map outward. The Secretary of State (or equivalent registry) is the only authoritative source of "this entity exists, here is its company number, here is its current status." Start there. Everything else (the website, the phone, the DBA) becomes an attribute attached to that anchor.
  3. Carry alternative names as first-class data. A registry will give you the legal name. Real-world inputs almost never use it. You need historical names, DBAs, fictitious business names, trade names, and translated names all indexed and searchable against the same entity.
  4. Use cross-registry identifiers. A single company often has a Secretary of State number, a federal EIN, a D-U-N-S number, a LEI, a VAT number, and a registry-issued business number. Cross-walking between them is how you confirm "the Delaware entity and the California foreign registration are the same business."
  5. Distinguish registered address from operating address. The registered address is often a law firm or a registered agent: useful for legal identity, useless for "where does this company actually operate." You need both, and you need to know which is which.
  6. Walk the control graph. Once you have the entity, you usually want to know what's above it (parents, ultimate beneficial owner) and what's beside it (subsidiaries, branches, sister entities). A flat entity lookup that ignores ownership structure will repeatedly mis-route the same accounts.
  7. Watch the lifecycle. Status, dissolution date, liquidation history, and insolvency flags should be part of every match response, not an afterthought. A "match" against a dissolved entity is a worse outcome than no match at all.

04 · ArchitectureHow LeadGenius approaches it

LeadGenius runs entity resolution as a data problem first and an API problem second. The underlying corporate database is structured exactly around the layered approach above, with seven related datasets that work together.

01 · Anchor
Companies
Every entity has a company_number plus jurisdiction_code, a composite primary key. Plus normalized name, lifecycle flags, parsed registered address, business phone, SIC codes, and home-jurisdiction linkage for foreign branches.
normalized_namecurrent_statusinactiveincorporation_datehas_been_liquidatedbusiness_phonehome_jurisdiction_code
02 · DBA layer
Alternative Names
Every DBA, trade name, historical name, and translated name observed for an entity, each with start and end dates. This is the layer that handles brand names that match no registry.
nametypestart_dateend_date
03 · Operating locations
Non-Registered Addresses
Mailing addresses, company addresses, and operating locations distinct from the registered address. Each carries its own type and validity window, so a street address that doesn't match the registered agent still resolves.
address_typestreet_addresslocalitypostal_codestart_date
04 · People to entity
Officers
Names, roles, positions, residency, partial DOB, and contact addresses. For UK data, a person_uid asserts the same human across multiple companies, useful when the only input is a person's name.
namepositionperson_uidcountry_of_residencenationality
05 · Cross-walk
Additional Identifiers
External identifiers from other registries and identifier systems, each tagged with an identifier_system_code. This is how a Delaware company number gets reconciled to a California foreign-corporation registration, and to global identifiers.
uididentifier_system_code
06 · Proof of life
Filings
Statutory filings with dates, types, IDs, and source URLs. A company filing regularly is operationally real; one in "good standing" with no filings for five years is worth a second look.
titledescriptiondateurl
07 · Control graph
Relationships
Four relationship types: control statements, subsidiaries, branches, and share parcels. Each row links a subject entity to an object entity with percentage ranges for share ownership and voting rights, plus start and end dates. This is what makes "find the ultimate parent" or "give me every entity in this corporate family" tractable rather than aspirational.
relationship_typesubject_entity_company_numberobject_entity_company_numberpercentage_min_share_ownershippercentage_max_voting_rightsstart_date

Together, these datasets mean that an incomplete input (a phone number, a DBA, an officer's name) can be resolved against any of several indexed attributes and then expanded into the full legal, operational, and ownership picture of the underlying business.

05 · ClosingThe honest take

If you're building this internally and it's working: congratulations, you've solved a genuinely hard problem and you should keep going. The Reddit poster who asked whether this is "a me problem or a real problem", it's a real problem, and the fact that you've built tooling for it is a sign you've been burned by it more than once.

But the reason teams eventually stop maintaining their own version isn't that the matching logic is too hard. It's the long tail of source maintenance: every Secretary of State changes its export format, every foreign registry has its own quirks, dissolution data lags, DBAs get filed and never updated, and the data is only as useful as it is current. That's the part that compounds, and it's the part LeadGenius is built to absorb so that your team can spend its time on the routing and decisioning logic that actually differentiates your product.

If you'd like to see what business identity resolution looks like against the LeadGenius dataset for your specific inputs, that's the easiest way to know whether it solves your version of the problem.

Resolve your hardest company records against the LeadGenius graph.

Bring a list of phones, domains, DBAs, or addresses. We'll show you what gets matched, what gets enriched, and how the control graph closes out the rest.

Our Resources

Learn From Our Resources

Discover expert insights, practical guides, and proven strategies to power your go-to-market success.

The Audience Layer Is Eating Ad Tech

Publicis just paid $2.2 billion for LiveRamp. The platforms are quietly losing the most important real estate in advertising, and most B2B marketers have not noticed yet.

read more

The Quiet Failure of B2B Paid Media

Why more budget isn't fixing your pipeline, and why the system can be working perfectly while your business gets less efficient by the quarter.

read more

What Does an AdGenius Performance Blueprint Consist Of?

An AdGenius Performance Blueprint is a custom, data-driven paid media diagnosis that combines funnel analysis, channel strategy, audience targeting, a 90-day flight plan, and KPI targets to show digital marketing leaders exactly where demand is leaking and what to do next.

read more

Ready to Find the
Contacts That Matter?

Get precise, compliant, and on-demand contact data—tailored to your business needs.