Printable PDF
Department of Mathematics,
University of California San Diego

****************************

Math 278B: Mathematics of Information, Data, and Signals

Tony Chiang

University of Washington

A peek through the looking glass: understanding latent embedding spaces and how we can use them

Abstract:

The intuition for an embedding space is a mapping from real world objects such as an image, sound, or text into a vector space which mimics the definition of mathematical representations (a map from an abstract structure such as an algebra into GL(V)). In deep learning, each hidden layer can be viewed as an embedding of the inputs that is learned by optimizing a loss on the training data. In this talk, we will focus on two embeddings, the initial and the final of a trained model. In particular, we show that the initial token embeddings for several LLMs do not seem to form a smooth manifold as assumed. This violation might explain the instability of LLM outputs in the neigbourhood of singular tokens. When viewed as feature extractors, we show that embeddings -- especially the final embeddings -- can serve as a very useful experimental tool to understand data distributions, e.g. synthetic vs real. This talk will be fairly informal so questions are welcome throughout.

December 5, 2025

11:00 AM

APM 6402

Research Areas

Mathematics of Information, Data, and Signals

****************************