visit
With this article, I intend to give a simple and concise explanation of the Jaccard index. It is a measure of the similarity between two sets of information. The Jaccard similarity coefficient was created by Grove Karl Gilbert in 1884 and since then it has seen an extensive range of diverse applications, from behavioral research to , obviously going through the NPL.
# libraries
import matplotlib.pyplot as plt
import matplotlib_venn as venn
GroupA = {1, 2, 3}
GroupB = {3, 4, 5}
To view the Venn diagrams we use the matplotlib_venn
library
venn.venn2([GroupA, GroupB], set_labels=('Group A','GroupB'))
plt.show()
# Intersection method
#
Intersection = GroupA.intersection(GroupB)
print("Intersection of GroupA and GroupB:", Intersection)
Intersection of GroupA and GroupB: {3}
Jaccard = Intersection / ( GroupA + GroupB - Intersection )
Jaccard = 1 / ( 3 + 3 - 1)
Jaccard = 1/5
Jaccard = 0.2
# specific code
#
len(Intersection) / ( len(GroupA) + len(GroupB) - len(Intersection) )