python - How to match strings with possible typos? -
i have multiple pdf converted text files , want search phrase might in files. problem conversion between pdf , text file not perfect there errors appear in text (such missing spaces between word; mix-up between i, l, 1's; etc.)
i wondering if there common technique give me "soft" search, looks @ hamming distance between 2 terms example.
if 'word' in sentence:
vs
if my_search('word',sentence, tolerance):
you can use this:
from difflib import sequencematcher text = """there 3rrors in text cannot find them""" def fuzzy_search(search_key, text, strictness): lines = text.split("\n") i, line in enumerate(lines): words = line.split() word in words: similarity = sequencematcher(none, word, search_key) if similarity.ratio() > strictness: return " '{}' matches: '{}' in line {}".format(search_key, word, i+1) print fuzzy_search('errors', text, 0.8)
which should output this:
'errors' matches: '3rrors' in line 2
Comments
Post a Comment