Python NLTK :: Intersecting words and sentences -
i'm using nltk - specific toolkit manipulating corpus texts, , i've defined function intersect user inputs shakespeare's words.
def shakespeareoutput(userinput): user = userinput.split() user = random.sample(set(user), 3) #here nltk's method play = gutenberg.sents('shakespeare-hamlet.txt') #all lowercase hamlet = map(lambda sublist: map(str.lower, sublist), play)
print hamlet
returns:
[ ['[', 'the', 'tragedie', 'of', 'hamlet', 'by', 'william', 'shakespeare', '1599', ']'], ['actus', 'primus', '.'], ['scoena', 'prima', '.'], ['enter', 'barnardo', 'and', 'francisco', 'two', 'centinels', '.'], ['barnardo', '.'], ['who', "'", 's', 'there', '?']...['finis', '.'], ['the', 'tragedie', 'of', 'hamlet', ',', 'prince', 'of', 'denmarke', '.']]
i find sentence contains occurrences of user words , return sentence. trying:
bestcount = 0 sent in hamlet: currentcount = len(set(user).intersection(sent)) if currentcount > bestcount: bestcount = currentcount answer = ' '.join(sent) return ''.join(answer).lower(), bestcount
calling function:
shakespeareoutput("the actus primus")
returns:
['the', 'actus', 'primus']
none
what doing wrong?
thanks in advance.
your way of evaluating currentcount
wrong. set intersection returns number of distinct elements matched, not count of matched elements.
>>> s = [1,1,2,3,3,4] >>> u = set([1,4]) >>> u.intersection(s) set([1, 4]) # len 2, total number matched elements 3
use following code.
bestcount = 0 sent in hamlet: currentcount = sum([sent.count(i) in set(user)]) if currentcount > bestcount: bestcount = currentcount answer = ' '.join(sent) return answer.lower(), bestcount
Comments
Post a Comment