Enrichment has many use cases. We, at Metabase, has dabbled with it from time to time for different analyses, but never did continuous enrichment of our customer contacts. This walks thru why we decided to do it and the detailed steps that we took so that it might save you some time if you want to do something similar.
This is probably something that every growth company wants at one point or another. For us, we wanted to enrich our customer contacts for several reasons:
There are surprisingly many service providers in this space — each with their pros/cons. While we did look into various LinkedIn data dump providers, we decided not to use them as we shouldn’t have that kind of data in our data warehouse. Clearbit was not evaluated since they are no longer available as a standalone service.
Using a few evaluation criteria, we were able to decide the best provider for our needs: Apollo.io. They provided the best coverage, pricing, and features for our use cases.
Criteria / Provider | LinkedIn Sales Navigator | Crunchbase | Lusha | Apollo | CommonRoom |
---|---|---|---|---|---|
Job History / LinkedIn Profile | Best available, but UI access only (API is limited/restricted/hard to get access). | Not available | Not available | Good / Looks recent | |
Mostly good, but some missing recent job changes. | |||||
Firmographic, such as industry size, industry, etc | Good / Everything on Company page. Search is UI only (API is limited/restricted/hard to get access).. | Good, but about 70% (high/med) to 80% (low confidence) coverage for 100 recent contacts. | Ok. Coverage is about 26% for 100 recent contacts. | Great at 80% coverage using domain match for 100 recent contacts. | Seems to be available mostly for company size, but industry is spotty and coverage is about ~60% for recent 100 contacts. |
Demographic, such as job title, etc. | Great / Everything on LinkedIn Profile. Search is UI only (API is limited/restricted/hard to get access). | Limited to select key people, like execs. | Ok. Coverage is about 26% for recent 100 contacts. | Good at 60% coverage for name, 50% for title / history for 100 recent contacts. | Yes, but: |
API can do exact domain/name and fuzzy search. 200 calls per minute / 1000 limit. | CSV upload / download via UI or API. | API or CSV (UI)
Contact enrichment is slow at 0.5 secs per call — that’s 1.4 hours per 10k records. There is a bulk API that can do 10 at a time / likely much faster. | Recurring/custom export for Enterprise plan only, otherwise manually via UI. Any field visible on the filter/browse screen can be exported via UI. Custom export could potentially do more. | | Cost | Core: $960 per person/year
Advanced with CRM integration: $1600 per person/year | API: $10k per year with 30% buy-now discount.
CSV Export of all 3m+ companies: $50k with 50% buy-now discount | About $20k to $25k per year for 100k contacts | $400 per month for 10k enrichments via API. ¢4 per record.
$3k per month for 100k enrichments. ¢3 per record.
And many other plan options. | Many plans from free to Enterprise based on # of contacts and features:
With the service provider decided, it was easy to use dlt and Apollo.io’s API to enrich new contacts hourly based on priority — new contact before updating existing, etc.
def enrich_contact(self, postgres_connect_string, to_schema):
pipeline = dlt.pipeline(pipeline_name='enrich_contact', destination='postgres', dataset_name=to_schema,
credentials=postgres_connect_string)
@dlt.resource(write_disposition='merge', primary_key='email')
def enriched_contact():
with pipeline.sql_client() as psql:
with psql.execute_query("select email from prioritized_contact") as cursor:
emails = cursor.fetchall()
for (email,) in emails:
enriched = self.people_match(email) # Call Apollo's "people/match" API
yield enriched['person']
print(pipeline.run(enriched_contact))
As our mission is to enable self-service, we added the enriched data, such as Organization Size & Industry, to our Customer and Contact models, so every team can use them for various use cases. For Organization Size, we grouped them into a few categories to simply analysis for our end-users:
, case
when estimated_employees <= 50 then 'Micro'
when estimated_employees <= 200 then 'Small'
when estimated_employees <= 1000 then 'Medium'
when estimated_employees <= 10000 then 'Enterprise'
when estimated_employees > 10000 then 'Mega Enterprise'
end as organization_size
An interesting insight from the enriched data is that most of our customers are from micro/small sized organizations.
Various teams have already used them to better understand our customers and keep tab on job changes, so it’s a win. There’s already talks on using it for more use cases. As there is a monetary cost to enrichment, we will continue to evaluate the values that we get for the cost and iterate from here.