Are LLMs Stuck In-Distribution?
Machine learning models have an IID assumption. That is data on which they are trained must be representative of data on which they will make predictions on later. The big question is AI is: Are generative models capable of generating data out of distribution? Naively, we I think no. But their data distribution is so vast that it’s hard to see at first. For example, an image generation model can interpolate within the space of all most images on the net. ...