Tracking loss and embeddings in Gensim word2vec model
我对Gensim还是很陌生,我正在尝试使用word2vec模型训练我的第一个模型。我看到所有参数都非常简单易懂,但是我不知道如何跟踪模型的损失以查看进度。另外,我希望能够在每个时期之后获得嵌入,这样我也可以证明预测在每个时期之后也变得更加合理。我该怎么办?
OR,是否每次训练iter = 1并节省每个时期后的损失和嵌入效果更好?听起来不太有效。
代码显示不多,但仍将其发布在下面:
1 2 3 4 5 6 7 8 9 | model = Word2Vec(sentences = trainset, iter = 5, # epoch min_count = 10, size = 150, workers = 4, sg = 1, hs = 1, negative = 0, window = 9999) |
示例:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 | from gensim.models.callbacks import CallbackAny2Vec class MonitorCallback(CallbackAny2Vec): def __init__(self, test_words): self._test_words = test_words def on_epoch_end(self, model): print("Model loss:", model.get_latest_training_loss()) # print loss for word in self._test_words: # show wv logic changes print(model.wv.most_similar(word)) """ prepare datasets etc. ... ... """ monitor = MonitorCallback(["word","I","less"]) # monitor with demo words model = Word2Vec(sentences = trainset, iter = 5, # epoch min_count = 10, size = 150, workers = 4, sg = 1, hs = 1, negative = 0, window = 9999, callbacks=[monitor]) |
-
现在
get_latest_training_loss 存在一些问题-可能是不正确的(运气不好,目前github崩溃了,无法检查)。我已经测试了此代码,并且损失增加了-看起来很奇怪。 -
也许您更喜欢
logging -gensim适合它。