Skip to content

Address Similiarity applying different types of Fuzzy String Matching in Python

Notifications You must be signed in to change notification settings

dintellect/Address_Similiarity_Using_FuzzyWuzzy

Repository files navigation

Extraction of State, District, City and Zip Code using NLTK and Address Similiarity Using FuzzyWuzzy

Fuzzy Wuzzy is a Python library uses Levenshtien Distance to calculate the differences between the sequences.

In order to compare and find similiarity between two addresses, different functions of fuzzy string matching are implemented.

Below is the list of fuzzy string matching function used :

1.Fuzz Ratio: The ratio function computes the standard Levenshtein distance similarity ratio between two sequences.

2.Partial Fuzz Ratio: This is similiar to fuzz ratio but will neglect all the small details like stop words, punctuations, capital letters.

3.Token_Sort_Ratio: Tokenize the strings and preprocess them by turning them to lower case and getting rid of punctuation ignoring word order.

4.Token_Set_Ratio: It is similar to token sort ratio, but little more flexible as it ignores duplicated words too.

5.W Ratio: A simple ratio function but handles lower and upper cases and some other parameters too.

Check jupyternotebook for final code

Please feel free to fork and contribute.

About

Address Similiarity applying different types of Fuzzy String Matching in Python

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published