python - How to get the Document Vector from Doc2Vec in gensim 0.11.1? -
is there way document vectors of unseen , seen documents doc2vec in gensim 0.11.1 version?
for example, suppose trained model on 1000 thousand - can doc vector 1000 docs?
is there way document vectors of unseen documents composed
same vocabulary?
for first bullet point, can in gensim 0.11.1
from gensim.models import doc2vec gensim.models.doc2vec import labeledsentence documents = [] documents.append( labeledsentence(words=[u'some', u'words', u'here'], labels=[u'sent_1']) ) documents.append( labeledsentence(words=[u'some', u'people', u'words', u'like'], labels=[u'sent_2']) ) documents.append( labeledsentence(words=[u'people', u'like', u'words'], labels=[u'sent_3']) ) model = doc2vec(size=10, window=8, min_count=0, workers=4) model.build_vocab(documents) model.train(documents) print(model[u'sent_3'])
here sent_3 known sentence.
for second bullet point, can not in gensim 0.11.1, have update 0.12.4. latest version has infer_vector function can generate vector unseen document.
documents = [] documents.append( labeledsentence([u'some', u'words', u'here'], [u'sent_1']) ) documents.append( labeledsentence([u'some', u'people', u'words', u'like'], [u'sent_2']) ) documents.append( labeledsentence([u'people', u'like', u'words'], [u'sent_3']) ) model = doc2vec(size=10, window=8, min_count=0, workers=4) model.build_vocab(documents) model.train(documents) print(model.docvecs[u'sent_3']) # generate vector known sentence print(model.infer_vector([u'people', u'like', u'words'])) # generate vector unseen sentence
Comments
Post a Comment