Deduplication

Duplicate data is painful. 
Anyone who’s crossed paths with another teammate unknowingly working with the same customer or spent hours on an analysis deemed invalid due to data quality knows this.
Outside of the pain, there’s of course time and resource costs associated with each of these activities.
In this case study, I will share my detailed design process of addressing such problem with the team.
The Challenge
The Effectiveness of Sales is Affected by Duplicate Data
In a sales organization, Leads and Contacts are 2 essential objects to successfully run the business. A Lead is a person or company that has never worked with your organization but may facilitate a future sale. A Contact is a person or company that you have qualified or have done business with in the past.
This two objects may enter a Customer Relationship Management (CRM) system like Base automatically from a web form, come in from a Marketing Automation system, be entered manually by a Sales Rep, or imported in bulk. Without proper deduplication strategies, each of these entry points would potentially introduce duplicates into the system.
At the time, Base wasn’t able to provide enough feedback and mechanisms to help customers address duplication issues especially when they are manually entering or importing Leads/Contacts. And this was identified as the #1 cause of churn for our major customers. We were on a mission to address this in 3 months.
My role
Growing with the Product
I actively participated in all parts of the project lifecycle, alongside with 1 Product Manager, 6 Engineers, and 1 Product Marketing Manager.
Specifically, I led the design efforts from research & exploring the problem space to delivering the final design spec & QA.
The Discovery
Understanding Customers’ Contexts and Problems
At the beginning of this project, we conducted both qualitative and quantitative research to uncover potential opportunities.
Types of Users
From our interviews with 13 major customers, we first identified 3 types of users that we need to take care of in this project:
  1. Sales reps who are responsible for daily communications with potential customers.
  2. Sales managers who are responsible for ensuring their sales reps are gathering complete and accurate data.
  3. Sales ops who are responsible for evaluating sales processes and managing tools used by the sales team.
Faced Problems
Then we generalized following problems that these types of users are facing:
The Scope
Focusing on Key Problems with Insights from Data
We analyzed all problems further with the support of data. Considering other projects down the road and the effectiveness of potential solutions, we decided to only focus on 5 of them.
Problems we are going to address
Problems we are not going to address for now
The Objective
Defining Goals and Success Metrics
After we‘ve confirmed all specific problems we are going to address in this project, we also defined specific goals we want to achieve and corresponding success metrics.
Business Goal
Product Goals
Deeper Insights
Capturing Situations, Motivations and Outcomes
During our discovery phase, we uncovered lots of insights from research and analysis of problems. But how might we boil these insights down into something actionable for the whole team? Something concise that could facilitate both communication and execution.
Inspired by the Job Story framework introduced by Intercom, we translated each problem into a Job Story based on the type (role) of the user:
These Job Stories informed us that there are 3 scenarios we need to consider:
  1. Manually entering a Lead/Contact  — Problem 1 and 2
  2. Import Leads/Contacts in bulk  — Problem 1, 3, 4
Explorations & Iterations
Giving Shapes to Ideas
Working with Remote Teams
One of the challenges we are facing at Base is most of our engineers and PMs are in Krakow, Poland. There is a 9 hour difference between Krakow and Bay Area, which means the overlapping work time for us is very limited.
So how do we make things work?
1. Be organized for every sync
We use Dropbox Paper to organize our sync agenda every day. Everything need to be discussed  (related questions, designs from previous day, feedback, new ideas, etc.) will be listed in a single Paper doc from all parties before the call, so that everybody can be prepared to discuss each issue with adequate preliminary knowledge.
2. Sharing your thinking process proactively and invite collaborations
Physical paper and pen are indispensable tools for me to externalize my ideas. That’s why I always keep a sketchbook at hand. But they also have some limitations when collaborating with remote teams.
So I usually start sketching ideas on paper, mock up directions in Sketch, and then share flows on InVision with detailed comments to communicate thinkings behind each consideration. InVision often becomes a platform for our intense discussions that everybody participates.
1. Manually entering a Lead/Contact
How might we show matching objects and help people make quick informed decisions?
One scenario regarding solving problems (1 and 2) is when a sales rep/manager manually adding a Lead/Contact. I did several rounds of iterations based on the feedback from the team and testing results from customers.
Iterations
Final
The following key insights and considerations informed the final design:
  1. Adding a new object in Base will always be done in a modal.
  2. Besides Name will be used for identifying duplicates, other fields like Phone Number, Email, Address, Social Links, etc. will also be used.
  3. Showing matched objects should be in real time as soon as possible.
  4. Sufficient context should be given to help people make quick decisions.
  5. Fuzzy search will be applied as we assume there might be typos.
  6. The maximum number of potential duplicates found for a typical customer would ranging from 0 to 10.
  7. Potential duplicates will be ordered by their score of similarity.
2. Import Leads/Contacts in bulk
How might we provide a friendly and reliable import experience?
The second scenario regarding solving problems (1, 3, 4) is about import. For many people, this might be their very first experience with using Base. To better help us explore ideas and drive decision making, we defined following principles for building an optimal import experience:
The Structure and Layout of Import
As one of the current experiences within Settings, before jumping right into designing import, I first analyzed its relationships with the overall Settings experience and other individual experiences in Settings. 
What I found was Import is a very focused and standalone experience, which doesn’t rely much on referencing other experiences. Besides Settings, other places like Global Add can also be a potential entry point for Import.
Based on this understanding, I started to explore different layouts of Import. The final version not only established a universal pattern for Import, but also envisioned a new Settings experience.
Iterations
Final
Mapping Data
One of the steps during an import is about mapping data from the import file to the correct fields in Base. 
Since the person who is dong the import may not be the same person who configured deduplication strategy in Settings, it is very import to give full clarity on what are the key fields used for deduplication in this step.
Choosing Import Strategy
This is another step during an import. I proposed two options:
Base on several rounds of internal discussions and testings with actual users, we chose Option 1 as the final version for the following reasons:
  1. Even though the intention of Option 2 was to provide a smoother import experience by removing clusters of options, what we found during testings was people always click on a dropdown to see what are the available options to make sure they selected the correct one, which actually made the experience more cumbersome.
  2. The purpose of each import may differ, so it is unrealistic to set proper defaults to really ensure a smooth experience.
  3. Option 1 better expressed the principles that we defined in the very beginning.
3. Configuring deduplication strategy
How might we translate complicated deduplication strategy into a simple configuration experience?
The last but not the least scenario regarding solving problem (5) is configuring deduplication strategy in Settings.
This is an experience that people won’t go through many times, and in most cases they may just experience once while they are setting up the account. With this in mind, it went through several iterations as well.
Iterations
Final
The final design provided both flexibility and a sense of record, so that a sales ops would always feels comfort whenever coming back to this section in Settings.
The Launch
Delivering Value to Customers
Suggested Leads/Contacts
Instead of conducting manual searches, a rep simply has to start entering in new lead or contact information for Base to suggest similar records. Once prompted with similar records and potential duplicates, reps can quickly edit existing records or create a new one if it is truly unique.
Flexible Import Duplicate Management
Of course reps need to have a reliable data foundation to build on. Instead of Base doing the guesswork for you during a mass contact, lead or account import, you can now tell Base how you want to handle duplicate records.
Customizable Duplicate Detection Rules
If default rules do not fit your sales process, you can create custom duplicate detection rules. Base’s new custom duplicate management allows you to configure your own duplicate “definition” using a combination of name, email, phone, address and even Custom Fields.