Python NLTK :: Intersecting words and sentences -


i'm using nltk - specific toolkit manipulating corpus texts, , i've defined function intersect user inputs shakespeare's words.

def shakespeareoutput(userinput):      user = userinput.split()     user = random.sample(set(user), 3)      #here nltk's method     play = gutenberg.sents('shakespeare-hamlet.txt')       #all lowercase     hamlet = map(lambda sublist: map(str.lower, sublist), play)  

print hamlet returns:

[ ['[', 'the', 'tragedie', 'of', 'hamlet', 'by', 'william', 'shakespeare', '1599', ']'], ['actus', 'primus', '.'], ['scoena', 'prima', '.'], ['enter', 'barnardo', 'and', 'francisco', 'two', 'centinels', '.'], ['barnardo', '.'], ['who', "'", 's', 'there', '?']...['finis', '.'], ['the', 'tragedie', 'of', 'hamlet', ',', 'prince', 'of', 'denmarke', '.']] 

i find sentence contains occurrences of user words , return sentence. trying:

    bestcount = 0     sent in hamlet:         currentcount = len(set(user).intersection(sent))         if currentcount > bestcount:             bestcount = currentcount             answer = ' '.join(sent)             return ''.join(answer).lower(), bestcount 

calling function:

   shakespeareoutput("the actus primus") 

returns:

['the', 'actus', 'primus'] none

what doing wrong?

thanks in advance.

your way of evaluating currentcount wrong. set intersection returns number of distinct elements matched, not count of matched elements.

>>> s = [1,1,2,3,3,4] >>> u = set([1,4]) >>> u.intersection(s) set([1, 4])    # len 2, total number matched elements 3 

use following code.

bestcount = 0  sent in hamlet:     currentcount = sum([sent.count(i) in set(user)])     if currentcount > bestcount:         bestcount = currentcount         answer = ' '.join(sent)  return answer.lower(), bestcount 

Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -