Motivation
-Embeddings have never been more en vogue - they are the core of Multimodal AI that fuses together the vision, speech, and text domains. The most prominent example is probably the recent OpenAI announcement that ChatGPT can now see, hear, and speak.
-An embedding is a compressed, dense representation of the raw data, be it text, images, or sound. This is in contrast to a sparse vector representation like, e.g., TF-IDF for text data. Dense embeddings are usually not beyond 2k dimensions, whereas sparse embeddings tend to be much higher dimensional. Dense embeddings thus offer a more efficient downstream processing.
-Figure: MNIST 3D
-