microembeddings
Word2Vec skip-gram from scratch — train, explore, and play with word vectors
Companion to the blog post: microembeddings: Understanding Word Vectors from Scratch
Loaded pre-trained full-text8 gensim vectors: 9995 words x 50 dims
Preloaded vectors use gensim Word2Vec on the full 17M-word text8 corpus. The Train tab reruns the NumPy implementation on a 500k-word subset so it stays interactive.
Train word embeddings from scratch on text8 (cleaned Wikipedia).
text8 is not bundled; Train will download it on first run.
25 100
1 10
0.001 0.05
1 15
Visualize the embedding space in 2D. Similar words cluster together.
100 500
Highlight category
Word vector arithmetic: A is to B as C is to ?
Computed as: B - A + C ≈ ?
Examples
| A | B | C |
|---|
Find the most similar words by cosine similarity.