python - How to match strings with possible typos? -

- March 15, 2012

i have multiple pdf converted text files , want search phrase might in files. problem conversion between pdf , text file not perfect there errors appear in text (such missing spaces between word; mix-up between i, l, 1's; etc.)

i wondering if there common technique give me "soft" search, looks @ hamming distance between 2 terms example.

if 'word' in sentence:

if my_search('word',sentence, tolerance):

you can use this:

from difflib import sequencematcher  text = """there  3rrors in text cannot find them"""  def fuzzy_search(search_key, text, strictness):     lines = text.split("\n")     i, line in enumerate(lines):         words = line.split()         word in words:             similarity = sequencematcher(none, word, search_key)             if similarity.ratio() > strictness:                 return " '{}' matches: '{}' in line {}".format(search_key, word, i+1)  print fuzzy_search('errors', text, 0.8)

which should output this:

'errors' matches: '3rrors' in line 2

Search This Blog

HTPPS

python - How to match strings with possible typos? -

Comments

Post a Comment

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -