Member-only story

How do we compare two sets of categorical values?

Banana Chip Tech
5 min readOct 5, 2024

--

An overview of Jaccard and Cosine similarity

Photo by Maja Petric on Unsplash

An example

Let’s start with a real world example to illustrate the different types of similarity measures. We will use the following scenario for the rest of our discussion.

John has just gone to the grocery store to buy fruit to make fruit salad. He bought bananas, pineapple, apples, and blueberries. After coming home, he discovers that his wife, Susan, has a family recipe for fruit salad that calls for apples, oranges, and blueberries. Can John make a reasonable substitute to Susan’s fruit salad recipe?

In this example, we can see that there are two sets that we need to compare. The first set is the fruit that John brought from the store: J = {bananas, pineapple, apples, blueberries}. The second set is the fruit that is needed to make Susan’s recipe: S = {apples, oranges, blueberries}. Since the two sets are of different sizes, how can we compare them?

Similarity Measures

This is where similarity measures come into play! There are multiple types of similarity measures that are optimized for different things.

Jaccard Similarity

One of the simplest similarity measures to implement is Jaccard similarity. The measure of Jaccard similarity, the Jaccard index, is calculated by the taking the intersection between the two sets (|A ∩ B|)and dividing it by the union (|A ∪ B|) of the two sets as indicated below.

J ( A , B ) = | A ∩ B | / | A ∪ B |

Let’s apply Jaccard similarity to our earlier example to get the jaccard index. Since we are searching for numeric representation of the similarity between John and Susans fruit needs, we will first need to convert the sets into binary values. This would result in the following table

-------------------------------------------------------------
| | Apples | Oranges | Blueberries | Pineapple | Banana |
-------------------------------------------------------------

|John | 1 | 0 | 1 | 1 | 1 |
-------------------------------------------------------------

|Susan| 1 | 1 | 1 | 0 | 0 |
-------------------------------------------------------------

--

--

Banana Chip Tech
Banana Chip Tech

Written by Banana Chip Tech

Banana Chip Tech is focused on optimizing healthcare through computation! We build apps, create websites, and develop tech courses for healthcare professionals!

No responses yet

Write a response