microembeddings

Word2Vec skip-gram from scratch — train, explore, and play with word vectors

Companion to the blog post: microembeddings: Understanding Word Vectors from Scratch

Loaded pre-trained full-text8 gensim vectors: 9995 words x 50 dims

Preloaded vectors use gensim Word2Vec on the full 17M-word text8 corpus. The Train tab reruns the NumPy implementation on a 500k-word subset so it stays interactive.

Train word embeddings from scratch on text8 (cleaned Wikipedia).

text8 is not bundled; Train will download it on first run.

Embedding dimension

25 100

Window size

1 10

Learning rate

0.001 0.05

Negative samples

1 15

Status

Training Loss