AutoML Has A Marketing Problem

I think AutoML is great. I think most people should be using it. Even data scientists and machine learning engineers. And not to “get started” but all the time on all things. Why? It’s probably better than you are. I recall Nick Erickson (author of AutoGluon) commenting in one of his videos/interviews that AutoML/AutoGluon is as good or better than the average data scientist. That the bar for AutoML to clear is low, much lower than most people think. I don’t have a quote at hand, sorry. ...

January 12, 2025 · 4 min · Jason Brownlee

Quake2 Bot Archive?!

I’m a hobby quake archivist. Over the last few years, I’ve been maintaining the Quake Bot Archive. It is/was a rewarding project filled with nostalgia and writing code to scrape+index+search the internet archive. Here’s a screenshot: I think I’ve taken the project pretty close to the edge. I emailed every single bot author I could find in all the old docs (think a massive spreadsheet of email addresses and current follow-up status, a client management system basically). I tracked down modern contact info for most bot authors and reached out. I carefully research the history of most bots to ensure I knew exactly what files were released (e.g. quake bot essays and quake bot chronology and quake bot genealogy much more). I maintained wishlists of wanted files and wishlists of broken URLs where wanted files were known to exist at one time. I kept expanding the scope from bots, to mods that had bots, to proxy bots/aimbots/server-side bots, and on. I posted to the community many times kindly asking for the old timers to check their old backup CDs. I searched usenet archives, mail archives, internet archives, warez archives, shovelware archive, etc. I indexed all files on all old quake addon cds. I indexed the files on all of the old quake webpages on the internet archive. And more… I used the same methods to build other archives, like the official quake archive which led me to many more helpful resources and generated many more ideas on how/where to search. ...

January 12, 2025 · 3 min · Jason Brownlee

Stacking Is Great

Stacking or Stacked Generalization is an ensemble machine learning algorithm. I’ve been obsessed with it since I discovered it as part of the Weka in the late 1990s and reading about it in the Weka “Data Mining” book at the same time. From the 2016 edition: Stacked generalization, or stacking for short, is a different way of combining multiple models. Although developed some years ago, it is less widely mentioned in the machine learning literature than bagging and boosting, partly because it is difficult to analyze theoretically and partly because there is no generally accepted best way of doing it—the basic idea can be applied in many different variations. ...

January 12, 2025 · 4 min · Jason Brownlee

Algorithm Skill vs Complexity Frontier

A thought bouncing around in the back of my head related to machine learning algorithm selection is skill vs complexity. Typically we want the most simplest skillful model, i.e. Occam’s Razor. A way of thinking about this is to assign a complexity score to a suite of algorithms and then evaluate each in turn. From this table of data, we can imagine a frontier (or Pareto front) of skill vs complexity. Come to think of it, I probably got the idea from this AutoGluon slide: ...

January 11, 2025 · 2 min · Jason Brownlee

Public vs Private Schools

I tripped over a discussion of public vs private schools in Melbourne (where I live with my family). Is private school for the kids worth it? We have two sons and went through this debate already a few years back. I read a ton of stuff at the time, including papers and books. I recall the following book helped a lot: Free Schools, David Gillespie, 2014. We got to: send them to public school, because: ...

January 11, 2025 · 5 min · Jason Brownlee

Excessive Cognitive Surplus

I re-read Clay Shirky’s 2010 book “Cognitive Surplus” yesterday. There’s some good ideas in there. Here’s a summary by Claude: “Cognitive surplus” is a concept popularized by Clay Shirky in his 2010 book “Cognitive Surplus: Creativity and Generosity in a Connected Age.” The basic idea is that modern society has a vast untapped reservoir of human intellectual capacity and free time that could be directed toward creative and socially beneficial purposes. The concept emerged from Shirky’s observation that people in developed economies collectively have billions of hours of free time that, historically, was often spent on passive consumption like watching television. With the rise of the internet and digital technologies, this surplus of human attention and cognitive capacity can be channeled into collaborative projects and creative endeavors. ...

January 10, 2025 · 4 min · Jason Brownlee

Scikit-Learn Algorithm Elo Ratings

I’ve been reading up on AutoGluon on and off over the last few weeks. It’s amazing! I highly recommend the podcast episode AutoGluon: The Story with Nick Erickson from 2023. Anyway, one cool thing Nick touched on in his AutoML 2024 presentation was Elo ratings for AutoGluon and a suite of other AutoML frameworks. Here’s a still from the presentation that captured my imagination: Watch the whole presentation: ...

January 10, 2025 · 2 min · Jason Brownlee

Local Temperature Forecast

A long time ago, I worked for the Australian Bureau of Meteorology. While there, I created many internal scripts/apps using weather data, for fun and internal personal use. One example was a tiny web app that plotted recent temperature observations, forecasts, and in-office temperature using a little USB temperature probe. It was very cool and a few members of our team kept it terminally open. Even today, I still leave the temperature forecast page for my suburb open all day long and refresh it before heading out to the gym, supermarket, whatever. ...

January 9, 2025 · 2 min · Jason Brownlee

StackOverflow New Questions

I saw this StackOverflow Dec 2024 stats pass by. It reports data on the number of new questions created on StackOverflow from July 2008 to December 2024. The raw data shows a dramatic drop in new questions in recent years. The data was collected and commented on by Theodore R. Smith, who is complaining that his reasonable new question posted to the site was closed and that aggressive closing of questions is the reason for the decrease in new questions on the site: ...

January 9, 2025 · 2 min · Jason Brownlee

Chat-Driven Programming

Over the last year, I rarely sit down and write code anymore. Instead, I have code generated and then iterate on it until we achieve the desired effect. I direct and collaborate. Similarly, to change existing code, I attach the file to a chat or paste the functions that require changes, summarize the change and iterate until it achieves the desired effect. My projects are almost always side projects. Whims. Ideas. Prototypes. Examples. Ad hoc’ery. Not production grade code. ...

January 8, 2025 · 3 min · Jason Brownlee