Member-only story

How do we compare two sets of categorical values?

Banana Chip Tech
5 min readOct 5, 2024

--

An overview of Jaccard and Cosine similarity

Photo by Maja Petric on Unsplash

An example

Let’s start with a real world example to illustrate the different types of similarity measures. We will use the following scenario for the rest of our discussion.

John has just gone to the grocery store to buy fruit to make fruit salad. He bought bananas, pineapple, apples, and blueberries. After coming home, he discovers that his wife, Susan, has a family recipe for fruit salad that calls for apples, oranges, and blueberries. Can John make a reasonable substitute to Susan’s fruit salad recipe?

In this example, we can see that there are two sets that we need to compare. The first set is the fruit that John brought from the store: J = {bananas, pineapple, apples, blueberries}. The second set is the fruit that is needed to make Susan’s recipe: S = {apples, oranges, blueberries}. Since the two sets are of different sizes, how can we compare them?

Similarity Measures

This is where similarity measures come into play! There are multiple types of similarity measures that are optimized for different things.

Jaccard Similarity

One of the simplest similarity measures to implement is Jaccard similarity. The measure of Jaccard similarity…

--

--

Banana Chip Tech
Banana Chip Tech

Written by Banana Chip Tech

Banana Chip Tech is focused on optimizing healthcare through computation! We build apps, create websites, and develop tech courses for healthcare professionals!

No responses yet