A recurring question in revenue ops, sales engineering, and data engineering communities goes something like this: "I have a phone number, a website, a DBA, or an address, and I need to know what company is actually behind it. What's the best way to do this?"
It sounds like a niche frustration, but anyone who has worked on lead routing, CRM hygiene, KYC/KYB, or fraud and risk has hit the same wall. The inputs are messy, the entities are slippery, and the "ground truth" is fragmented across hundreds of independent sources: every state's Secretary of State, every country's company registry, every Google Maps listing, every domain WHOIS record, every social profile.
This post walks through why business identity resolution is hard, the moves that actually work, and how LeadGenius approaches it.
01 · FoundationsWhy this problem is so much harder than people expect
Person-level identity resolution is well understood. A name plus a few attributes (email, address, phone, DOB) narrows quickly to a single human. Companies don't behave that way.
A single operating business can be:
- A registered legal entity (LLC, Inc., GmbH, Ltd., Pty.)
- Doing business under one or more DBAs that are not the legal name
- Registered in multiple jurisdictions as separate legal entities ("foreign corporations" or branches)
- Owned by a holding company that sits between it and the ultimate parent
- Sharing a phone number, address, or website with a sibling entity in the same corporate family
- Renamed multiple times across its life, sometimes with the same company number, sometimes with a brand-new one
So when someone hands you "415-555-0142, acmewidgets.com, Acme Widgets" and asks who is this?, the honest answer is that it might be any of three or four different things depending on what you need the answer for. Lead routing wants the operating brand. KYB wants the registered legal entity. Account hierarchy wants the ultimate parent.
That's the first reason this is hard: the question itself is underspecified. The second is that the source data is fragmented across jurisdictions, formats, and update cadences. The third (the one that quietly defeats most internal tooling) is that the changes matter. Companies dissolve. They get reinstated. They change names. They get acquired. A snapshot from six months ago will quietly route deals to entities that no longer exist.
02 · Reality checkThe inputs you actually have
In practice, the inputs that show up at the front of an entity resolution problem tend to be some combination of:
- A website or domain, often without a www, sometimes with a redirect chain
- A phone number, often a main line shared across departments or franchise locations
- A DBA or trade name that doesn't match anything in any registry
- A postal address, which might be a registered agent, a virtual office, a co-working space, or a real HQ
- An officer or contact name: a person, often without a clear role
- A legal-name fragment: "Acme Widgets" when the registered name is "Acme Widgets Holdings, LLC"
The hardest records to match (the ones that break most tooling) share a few signatures: small private companies with common names, recently renamed entities, foreign branches of multinationals, sole proprietors operating under a DBA, and franchisees whose phone and address point to the corporate parent rather than the operating location.
03 · MethodWhat actually works
There is no single magic input. What works is a layered approach where each layer narrows the candidate set and raises confidence:
- Normalize aggressively before you match. Strip legal suffixes, punctuation, casing, and common abbreviations. "Acme Widgets, LLC" and "ACME WIDGETS L.L.C." and "Acme Widgets Limited Liability Company" need to collapse to the same key before any join happens. This single step recovers more matches than any clever algorithm downstream.
- Resolve to a registered legal entity first, then map outward. The Secretary of State (or equivalent registry) is the only authoritative source of "this entity exists, here is its company number, here is its current status." Start there. Everything else (the website, the phone, the DBA) becomes an attribute attached to that anchor.
- Carry alternative names as first-class data. A registry will give you the legal name. Real-world inputs almost never use it. You need historical names, DBAs, fictitious business names, trade names, and translated names all indexed and searchable against the same entity.
- Use cross-registry identifiers. A single company often has a Secretary of State number, a federal EIN, a D-U-N-S number, a LEI, a VAT number, and a registry-issued business number. Cross-walking between them is how you confirm "the Delaware entity and the California foreign registration are the same business."
- Distinguish registered address from operating address. The registered address is often a law firm or a registered agent: useful for legal identity, useless for "where does this company actually operate." You need both, and you need to know which is which.
- Walk the control graph. Once you have the entity, you usually want to know what's above it (parents, ultimate beneficial owner) and what's beside it (subsidiaries, branches, sister entities). A flat entity lookup that ignores ownership structure will repeatedly mis-route the same accounts.
- Watch the lifecycle. Status, dissolution date, liquidation history, and insolvency flags should be part of every match response, not an afterthought. A "match" against a dissolved entity is a worse outcome than no match at all.
04 · ArchitectureHow LeadGenius approaches it
LeadGenius runs entity resolution as a data problem first and an API problem second. The underlying corporate database is structured exactly around the layered approach above, with seven related datasets that work together.
company_number plus jurisdiction_code, a composite primary key. Plus normalized name, lifecycle flags, parsed registered address, business phone, SIC codes, and home-jurisdiction linkage for foreign branches.normalized_namecurrent_statusinactiveincorporation_datehas_been_liquidatedbusiness_phonehome_jurisdiction_code
nametypestart_dateend_date
address_typestreet_addresslocalitypostal_codestart_date
person_uid asserts the same human across multiple companies, useful when the only input is a person's name.namepositionperson_uidcountry_of_residencenationality
identifier_system_code. This is how a Delaware company number gets reconciled to a California foreign-corporation registration, and to global identifiers.uididentifier_system_code
titledescriptiondateurl
relationship_typesubject_entity_company_numberobject_entity_company_numberpercentage_min_share_ownershippercentage_max_voting_rightsstart_date
Together, these datasets mean that an incomplete input (a phone number, a DBA, an officer's name) can be resolved against any of several indexed attributes and then expanded into the full legal, operational, and ownership picture of the underlying business.
05 · ClosingThe honest take
If you're building this internally and it's working: congratulations, you've solved a genuinely hard problem and you should keep going. The Reddit poster who asked whether this is "a me problem or a real problem", it's a real problem, and the fact that you've built tooling for it is a sign you've been burned by it more than once.
But the reason teams eventually stop maintaining their own version isn't that the matching logic is too hard. It's the long tail of source maintenance: every Secretary of State changes its export format, every foreign registry has its own quirks, dissolution data lags, DBAs get filed and never updated, and the data is only as useful as it is current. That's the part that compounds, and it's the part LeadGenius is built to absorb so that your team can spend its time on the routing and decisioning logic that actually differentiates your product.
If you'd like to see what business identity resolution looks like against the LeadGenius dataset for your specific inputs, that's the easiest way to know whether it solves your version of the problem.
Resolve your hardest company records against the LeadGenius graph.
Bring a list of phones, domains, DBAs, or addresses. We'll show you what gets matched, what gets enriched, and how the control graph closes out the rest.



