Links
💬

Char/Word/Sent/Doc Embedding Models

Comparison of a few embedding algorithms used in natural language processing (NLP) tasks.
Name
vectorizes...
derived from
description
Can be trained with...
word2vec
word
give neighboring words, guess pivot word (cbow); or the other way around (skip-gram)
gensim
doc2vec
paragraph
word2vec
basically adding a paragraph vector to neighboring words while training
gensim
fastText
sub-word n-grams
word2vec
can gen. sent. vec.s too, but simply via sum & avg.
gensim
GloVe
word
works on co-occurrence matrix instead of training prediction models. Surprisingly simil. to w2v.
glove-python
BERT
Transformer w/ self-attn.; take whole doc. at once; (1) randomly masks out + replaces 10% words and try guess original; (2) try predict whether is next sent. VERY resource-hungry.
ELMo
word
context-dep. bidirectional LSTM.
flair
char
char-level, context-dep., LSTM. Looks cool but why so few mentions?
There are also many sentence embedding algorithms that worth looking at: https://medium.com/huggingface/universal-word-sentence-embeddings-ce48ddc8fc3a.