Summary

Enrichment has many use cases. We, at Metabase, has dabbled with it from time to time for different analyses, but never did continuous enrichment of our customer contacts. This walks thru why we decided to do it and the detailed steps that we took so that it might save you some time if you want to do something similar.

Why We Did It

This is probably something that every growth company wants at one point or another. For us, we wanted to enrich our customer contacts for several reasons:

  1. To be alerted on job changes for contacts that may be material to our relationship. E.g. If a customer contact has transitioned to another job inside or outside of the company, we may want to reach out to congratulate them on the change and see how we can help to make sure everything continues to work smoothly with their Metabase service.
  2. Get a better sense of organization size and industry for various analyses. You probably know what they are so I won’t bore you with the details.

Evaluation of Service Providers

There are surprisingly many service providers in this space — each with their pros/cons. While we did look into various LinkedIn data dump providers, we decided not to use them as we shouldn’t have that kind of data in our data warehouse. Clearbit was not evaluated since they are no longer available as a standalone service.

Using a few evaluation criteria, we were able to decide the best provider for our needs: Apollo.io. They provided the best coverage, pricing, and features for our use cases.

Criteria / Provider LinkedIn Sales Navigator Crunchbase Lusha Apollo CommonRoom
Job History / LinkedIn Profile Best available, but UI access only (API is limited/restricted/hard to get access). Not available Not available Good / Looks recent
Mostly good, but some missing recent job changes.
Firmographic, such as industry size, industry, etc Good / Everything on Company page. Search is UI only (API is limited/restricted/hard to get access).. Good, but about 70% (high/med) to 80% (low confidence) coverage for 100 recent contacts. Ok. Coverage is about 26% for 100 recent contacts. Great at 80% coverage using domain match for 100 recent contacts. Seems to be available mostly for company size, but industry is spotty and coverage is about ~60% for recent 100 contacts.
Demographic, such as job title, etc. Great / Everything on LinkedIn Profile. Search is UI only (API is limited/restricted/hard to get access). Limited to select key people, like execs. Ok. Coverage is about 26% for recent 100 contacts. Good at 60% coverage for name, 50% for title / history for 100 recent contacts. Yes, but:
  1. Coverage is limited to ~60% for org and ~30% for job title based on 100 recent contacts. | | Export to data warehouse | Only integration with CRM, such as Salesforce/HubSpot with very limited capabilities (differs per CRM), such as new lead/account sync or embedding profile. | Enterprise plan supports dataset download / API

API can do exact domain/name and fuzzy search. 200 calls per minute / 1000 limit. | CSV upload / download via UI or API. | API or CSV (UI)

Contact enrichment is slow at 0.5 secs per call — that’s 1.4 hours per 10k records. There is a bulk API that can do 10 at a time / likely much faster. | Recurring/custom export for Enterprise plan only, otherwise manually via UI. Any field visible on the filter/browse screen can be exported via UI. Custom export could potentially do more. | | Cost | Core: $960 per person/year

Advanced with CRM integration: $1600 per person/year | API: $10k per year with 30% buy-now discount.

CSV Export of all 3m+ companies: $50k with 50% buy-now discount | About $20k to $25k per year for 100k contacts | $400 per month for 10k enrichments via API. ¢4 per record.

$3k per month for 100k enrichments. ¢3 per record.

And many other plan options. | Many plans from free to Enterprise based on # of contacts and features:

  1. Free up to 500 contacts / 50 orgs
  2. Starter at $625/mo up to 35k contacts.
  3. Team at $1250/mo up to 100k contacts
  4. Enterprise at custom pricing with export to data warehouse feature. $50k+ per year |

Continuous Enrichment

With the service provider decided, it was easy to use dlt and Apollo.io’s API to enrich new contacts hourly based on priority — new contact before updating existing, etc.

def enrich_contact(self, postgres_connect_string, to_schema):
    pipeline = dlt.pipeline(pipeline_name='enrich_contact', destination='postgres', dataset_name=to_schema,
                            credentials=postgres_connect_string)
    
    @dlt.resource(write_disposition='merge', primary_key='email')
    def enriched_contact():
        with pipeline.sql_client() as psql:
            with psql.execute_query("select email from prioritized_contact") as cursor:
                emails = cursor.fetchall()
                    
        for (email,) in emails:
            enriched = self.people_match(email)  # Call Apollo's "people/match" API
            yield enriched['person']

    print(pipeline.run(enriched_contact))

Modeling for Self-Service

As our mission is to enable self-service, we added the enriched data, such as Organization Size & Industry, to our Customer and Contact models, so every team can use them for various use cases. For Organization Size, we grouped them into a few categories to simply analysis for our end-users:

, case
    when estimated_employees <= 50 then 'Micro'
    when estimated_employees <= 200 then 'Small'
    when estimated_employees <= 1000 then 'Medium'
    when estimated_employees <= 10000 then 'Enterprise'
    when estimated_employees > 10000 then 'Mega Enterprise'
  end as organization_size

Final Thoughts

An interesting insight from the enriched data is that most of our customers are from micro/small sized organizations.

Various teams have already used them to better understand our customers and keep tab on job changes, so it’s a win. There’s already talks on using it for more use cases. As there is a monetary cost to enrichment, we will continue to evaluate the values that we get for the cost and iterate from here.

About