Skip to content

Latest commit

 

History

History
57 lines (45 loc) · 2.57 KB

README.md

File metadata and controls

57 lines (45 loc) · 2.57 KB

We want to try out the loss reporting functionality as discussed in gensim issue #999 and (partially) implemented in gensim PR #1201, using the develop branch of gensim as of 2017-07-23 (commit da383bf4a4046b134d95d9085eedb163dd5e0c46), and the first 100k posts (> 3M words) from the One Million Posts Corpus as data.

The test script is here

MAXALPHA=0.1
python test_doc2vec_loss.py --maxalpha "$MAXALPHA" 2>&1 | tee log_maxalpha_"$MAXALPHA".txt
mv word2vec_loss.png word2vec_loss_maxalpha_"$MAXALPHA".png

The log file is here.

$ grep '^After epoch' log_maxalpha_"$MAXALPHA".txt
After epoch 1: latest training loss is 10294905.000000
After epoch 2: latest training loss is 1203828.125000
After epoch 3: latest training loss is 245310.640625
After epoch 4: latest training loss is 166148.406250
After epoch 5: latest training loss is 136867.781250
After epoch 6: latest training loss is 121953.609375
After epoch 7: latest training loss is 111192.812500
After epoch 8: latest training loss is 109865.835938
After epoch 9: latest training loss is 107571.671875
After epoch 10: latest training loss is 107939.546875

This is nice, it looks like a typical epoch vs. loss plot: monotonically decreasing, large changes at first and then less and less changes.

With a smaller max_alpha however, it looks problematic:

MAXALPHA=0.05
python test_doc2vec_loss.py --maxalpha "$MAXALPHA" 2>&1 | tee log_maxalpha_"$MAXALPHA".txt
mv word2vec_loss.png word2vec_loss_maxalpha_"$MAXALPHA".png

The log file is here.

$ grep '^After epoch' log_maxalpha_"$MAXALPHA".txt
After epoch 1: latest training loss is 8382980.000000
After epoch 2: latest training loss is 9314768.000000
After epoch 3: latest training loss is 9695628.000000
After epoch 4: latest training loss is 9800970.000000
After epoch 5: latest training loss is 9851732.000000
After epoch 6: latest training loss is 10716296.000000
After epoch 7: latest training loss is 10940231.000000
After epoch 8: latest training loss is 10656052.000000
After epoch 9: latest training loss is 10707312.000000
After epoch 10: latest training loss is 10796318.000000

With these smaller learning rates, the loss increases over time!?