Introduction to tokenization methods, including subword, BPE, WordPiece and SentencePiece — ⚠️ READ THE ORIGINAL POST IN MY BLOG ⚠️ This article is an overview of tokenization algorithms, ranging from word level, character level and subword level tokenization, with emphasis on BPE, Unigram LM, WordPiece and SentencePiece. It is meant to be readable by both experts and beginners alike. …