Salesforce duplicate detector apex

12/25/2023

In a lot of cases, you are comparing an empty field with a field containing a value. In a scenario, you combine different matching methods on different fields to evaluate if records are duplicate. You include fields that are (almost) unique for a single person, such as first name, last name, phone number, email address, birth date, social security number and so on.Įxample of a scenario to find duplicates in the Lead object: These matching methods will give you fewer false positives when looking for duplicates.īased on our years of experience building Duplicate Check and consulting clients we share some best practices with you.Ī scenario consists of a number of fields with corresponding matching methods and aims to find duplicates for a specific Object. My advice is to always apply a special matching method, when it is available for a field you want to include in your matching. A matching method specific for company names may ignore legal entities (such as Inc., Ltd., LLC, etcetera). A specialized phone number matching method will ignore spaces, dashes and standardize prefixes for a valid comparison. When matching telephone numbers, you will get much better results if they are in the same format. Most of them are based on either exact or fuzzy and include some additional logic. Special matching methodsĪlmost all deduplication solutions offer more specialized matching methods. Note: A different letter in the last name leads to a lower score. Setting a high threshold when using fuzzy matching makes sure you don’t get too many false positives.

However, it is based on the length of the longest string.Īs you can see, the score is much higher for longer strings with the same edit distance. The process to calculate the maximum edit distance is too complex to show here. Matching score is generally calculated by subtracting the result of the division of the found edit distance by the maximum edit distance of the two values of 1. To combat this problem, most deduplication solutions use a matching score based on multiple fields and a threshold to determine duplicate records. The longer the string, the less the impact of an edit on the meaning. Shorter strings often have entirely different meanings with one or two edits. Purely using edit distance for this goal is not ideal, especially for shorter strings (names, words).

The goal of matching is to return similar results (with the same meaning). In this case only the insertion of the letter ‘h’ in John will make the two strings equal. Jon Doe John Doe has an edit distance of 1. This is sometimes also called ‘Levenshtein distance’ after the Soviet mathematician Vladimir Levenshtein, who did extensive research on the subject.Įdit distance is the number of single character edits (insert, delete or change) needed to change one string into another. One of the most used algorithms is based on the concept of ‘Edit distance’. Similarity, scoring often involves a combination of different algorithms. It’s like looking through almost closed eyelids, with your vision becoming fuzzy and it’s hard to distinguish small differences between words. Some solutions offer variations on exact match, such as ‘Exact (Random Order)’:Īs you can see “Exact (Random Order)” means the individual words have to match exactly, but not necessarily in the same order.įuzzy matching will return a match when two fields are alike (similar). Tip: note that different vendors use different names for the same thing (matching method and matching algorithm are the same)įor an exact match method to evaluate two fields as duplicate they have to match…exactly.

Since all matching methods can be divided into two main groups: exact and fuzzy matching, that is where we’ll start. In this blog, we will explore the most important matching methods and when to use them, followed by some best practices in combining matching methods in a matching rule or scenario. I dnt think so you can able to do it with the standard resource URL sobject/Contactsīut if its a custom one. Thanks a lof for any advice you can provide. Is there a SF setting that controls the response values I will receive which I don't know yet or am I misunderstanding how this mechanism is supposed to work. However, I do not seem to get any ObjectIDs back from this. So now I would actually like to use one of the duplicates detected by SF instead. However, if SF does detect a duplicate I get: If SF does not recognize any duplicates everything is peachy and I can get on with the rest of the process. Basically, what I want to do:Ĭall POST on sobject/Contacts with my contact details. I know about the various matching rules I can configure in SF but I am not sure what to do with the response. We are currently writing an integration and I am not completely clear on the handling of duplicate records created via REST API.

0 Comments

Salesforce duplicate detector apex

Leave a Reply.

Author

Archives

Categories