microembeddings

Word2Vec skip-gram from scratch — train, explore, and play with word vectors

Companion to the blog post: microembeddings: Understanding Word Vectors from Scratch

Loaded pre-trained full-text8 gensim vectors: 9995 words x 50 dims

Preloaded vectors use gensim Word2Vec on the full 17M-word text8 corpus. The Train tab reruns the NumPy implementation on a 500k-word subset so it stays interactive.

Train word embeddings from scratch on text8 (cleaned Wikipedia).

text8 is not bundled; Train will download it on first run.

25 100
1 10
0.001 0.05
1 15