LLM Meta-Cognition and Exploring the Adjacent Possible

Andrej Karpathy has a wonderful tweet on what he calls learned “cognitive strategies” but I think more generally is referred to as “meta cognition”. The piece I like is: …The models discover, in the process of trying to solve many diverse math/code/etc. problems, strategies that resemble the internal monologue of humans, which are very hard (/impossible) to directly program into the models. I call these “cognitive strategies” - things like approaching a problem from different angles, trying out different ideas, finding analogies, backtracking, re-examining, etc. Weird as it sounds, it’s plausible that LLMs can discover better ways of thinking, of solving problems, of connecting ideas across disciplines, and do so in a way we will find surprising, puzzling, but creative and brilliant in retrospect… ...

January 29, 2025 · 8 min · Jason Brownlee

How to Learn Machine Learning Algorithms (for Programmers)

I’ve written a ton of tutorials and books to help developers learn machine learning algorithms over the years. It’s not my area any longer, but if asked, my suggestion for a programmer (that learns via programming) is to code machine learning algorithms from scratch. Is this you? It is me. It’s how I learn best. Here, I really mean that we learn best by: reading about thing writing code for the thing running the thing it doesn’t work (it never works first go) iterate until the implementation works (and you really-actually-deeply learn the thing) This is how we programmers learned a ton of algorithms and data structures in our CS or SWE degree, or whatever. ...

January 29, 2025 · 3 min · Jason Brownlee

Are LLMs Stuck In-Distribution?

Machine learning models have an IID assumption. That is data on which they are trained must be representative of data on which they will make predictions on later. The big question is AI is: Are generative models capable of generating data out of distribution? Naively, we I think no. But their data distribution is so vast that it’s hard to see at first. For example, an image generation model can interpolate within the space of all most images on the net. ...

January 28, 2025 · 6 min · Jason Brownlee

AI Intuitive Physics

In the last two episodes of “the cognitive revolution” podcast, the host (Nathan Labenz) has mentioned AI’s developing an intuition for the physics of a domain. Specifically: Material Progress: Developing AI’s Scientific Intuition, with Orbital Materials’ Jonathan & Tim Emergency Pod: Reinforcement Learning Works! Reflecting on Chinese Models DeepSeek-R1 and Kimi k1.5 By “physics”, he means the actual rules that limit physical domains, but we can generalize and say any domain. ...

January 28, 2025 · 4 min · Jason Brownlee

Selfish Software

I just read a new post by Edmar Ferreira titled: Selfish Software It’s his take on what we previously called “chat-driven programming”, but perhaps broader. I thought it was user-focused, but his description too is engineer focused, but more personal. His journey. Selfish software refers to writing code for yourself without any external customers in mind. I like the name, I guess. Perhaps the process of creating software this way we can call “chat-driven programming” and the artifacts that are a result we can call “selfish software”, or what I have previously been calling “disposable software”. ...

January 28, 2025 · 2 min · Jason Brownlee

Ambergris

I’ve read Jeff Vanermeers Ambergris trilogy of books maybe a dozen times. I will probably keep reading it annually for years to come. But why? Why has it captured me? I first read the books on paper, one-by-one. City of Saints and Madmen Shriek: An Afterword Finch Later, I got the compendium of all three books. Ambergris Here’s the cool cover: It’s edited down slightly and easier to handle. To sit-with. ...

January 27, 2025 · 5 min · Jason Brownlee

First Serious PC

This is a nostalgia piece. I worked at a pizza shop in high-school. I used the money to buy my first PC in 1996. I was 15, in year 9, I was obsessed with Quake (quake 1) and the PC was a Intel 133mhz. We had “family” PCs before this, but this one was mine alone. Purchased with money I earned through hard labor. I also purchased a dial-up 33.6 modem and used me brand new PC to get on the internet with an ISP named Alphalink. ...

January 27, 2025 · 3 min · Jason Brownlee

Breaking Through

I love scenes in stories of the main character “breaking through”. Doing something so extreme they break tear through the fabric of reality. Dislocate. One I love is in Alan Moore’s graphic novel “From Hell”. The main character, the killer, experiences moments of “temporal dislocation” while he is killing. While reading, these are shocking moments. As a reader, you don’t know what is going on as we go from Victorian era England to modern cityscapes for seemingly no reason. We piece together, that he’s dislocating, probably because of the occult stuff he’s doing coupled with extreme acts of violence. ...

January 26, 2025 · 3 min · Jason Brownlee

The Bitter Lesson Leads To Evolutionary Computation

I saw a note about the bitter lesson go by and I left a comment. It’s an idea I’ve had a for a long time but not really said out loud. Recall the bitter lesson (via claude): The Bitter Lesson, articulated by Rich Sutton in 2019, argues that in artificial intelligence research, methods that leverage computation and large amounts of data have historically outperformed approaches based on human knowledge and hand-crafted rules. The “bitter” part is that our human intuitions about how to solve problems often turn out to be less effective than simple methods that scale well with computing power. This has been demonstrated repeatedly in areas like computer chess, speech recognition, and machine translation, where brute-force computational approaches ultimately surpassed carefully designed human-engineered solutions. ...

January 25, 2025 · 3 min · Jason Brownlee

Ergodicity and Path Dependence in Machine Learning

I was thinking about ergodicity yesterday in the context of machine learning. Specifically the path-dependency of training the “best” model and our typical solution to this challenge to train a monte carlo ensemble of final models (e.g. same learning algorithm + varied random seeds, same training data) and combine their predictions. Could we do better? Are we really stuck and is this really the best way out? My introduction to ergodicity came initially from Taleb’s books. I think it was Antifragile that dug into (the related idea) of Jensen’s inequality for the payoff in games, and his book Skin in the Game that dug into ergodicity in games with ruin. ...

January 25, 2025 · 8 min · Jason Brownlee