python - How to get the Document Vector from Doc2Vec in gensim 0.11.1? -


is there way document vectors of unseen , seen documents doc2vec in gensim 0.11.1 version?

  • for example, suppose trained model on 1000 thousand - can doc vector 1000 docs?

  • is there way document vectors of unseen documents composed
    same vocabulary?

for first bullet point, can in gensim 0.11.1

from gensim.models import doc2vec gensim.models.doc2vec import labeledsentence  documents = [] documents.append( labeledsentence(words=[u'some', u'words', u'here'], labels=[u'sent_1']) ) documents.append( labeledsentence(words=[u'some', u'people', u'words', u'like'], labels=[u'sent_2']) ) documents.append( labeledsentence(words=[u'people', u'like', u'words'], labels=[u'sent_3']) )   model = doc2vec(size=10, window=8, min_count=0, workers=4) model.build_vocab(documents) model.train(documents)  print(model[u'sent_3']) 

here sent_3 known sentence.

for second bullet point, can not in gensim 0.11.1, have update 0.12.4. latest version has infer_vector function can generate vector unseen document.

documents = [] documents.append( labeledsentence([u'some', u'words', u'here'], [u'sent_1']) ) documents.append( labeledsentence([u'some', u'people', u'words', u'like'], [u'sent_2']) ) documents.append( labeledsentence([u'people', u'like', u'words'], [u'sent_3']) )   model = doc2vec(size=10, window=8, min_count=0, workers=4) model.build_vocab(documents) model.train(documents)  print(model.docvecs[u'sent_3']) # generate vector known sentence print(model.infer_vector([u'people', u'like', u'words'])) # generate vector unseen sentence 

Comments

Popular posts from this blog

wordpress - (T_ENDFOREACH) php error -

Export Excel workseet into txt file using vba - (text and numbers with formulas) -

Using django-mptt to get only the categories that have items -