This riff 1 derives from a recent "AI Programmer" story that's making people in my corner of the nerdiverse sit up and talk, at a time when hot new AI happenings have become mundane.
Recently HackerNews front-paged the announcement of a Devin AI. Earlier today m'colleague remarked that he is on the wait list to use the Devin to get a bunch of his side projects done.
It is yet another prompt for me to take the lowkey counterfactual bet against the AI wave, in favour of good old flesh and blood humans, and our chaotic, messy systems. What follows will make no sense to an LLM (because it is not a sense-maker) and it will feel nothing (because it is not a sense-maker). So if it riles you up, congratulations, you are a living, breathing person.
Most line-of-business coding — all that grunt work of coding features, configurations, bug fixes etc. is "side project" sized and shaped. Big tasks are made of many little tasks. The bet is AI Programmers will do this for us, and with sufficient task breakdown, some lightweight supervision and clever prompting do the labour of a hundred men 2.
Now, if we assume any programming team can get their hands on shiny new Dev-in tools (haha, see what I did there?) to make that work easy, then all of them ought to be able to level up against each other. If the Invisible Hand does its job, competition will be fierce (like, a prompt engineering arms race?) because only the undifferentiated have to compete. Assuming this happens (as the AI bettors definitely believe), what will set a person or team apart from any other similarly-abled AI tool user? I think it will be, that the infusion of AI grunt work tools will loft and liven the essential role of the wetware in our skulls, and the squishy feelings in our guts. It remains to be seen what kind of history we choose to create.
“Once men turned their thinking over to machines in the hope that this would set them free. But that only permitted other men with machines to enslave them.”
— Frank Herbert, Dune
Further (and not to pick on the Devin specifically), the numbers they publish are telling. Their claimed ~14% SWE-benchmark task completion looks much better than the field. But, one, it is in the confines of the SWE-benchmark game, and we all know how well mouse models generalise to human models. And, two, even if it gets to 95% on that artificial benchmark (which I highly doubt, because, complexity explosion would demand insane amounts of energy and horsepower), it will still not be good enough. An automated software system that is "works 90% of the time according to mouse models" will require humans in the loop at all times to check the work. The more complex the work, the harder it will be to exercise reasonable oversight. If your error margins are forgiving enough, you could remove the humans. Maybe bog standard SaaS products are a juicy target for AI programmers, if the economics and error/failure margins stay as big and forgiving as they are (~90% gross margins are nothing to sneeze at). Will that moat remain, if means of production become far cheaper than human pay cheques, thereby lowering the bar to participate, thereby spreading those juicy margins out thin?
Now I'm no analysis whiz, but aren't claims about accuracy suspect sans error bars and confidence intervals? As far as I can tell these systems can never provide that information by construction. Maybe one can mitigate error probabilities with a form of statistical sampling, e.g. "solve this problem three different ways". But won't that only add to the madness, because one will never be able to get the same answer twice for the same input, unless an AI's model state is frozen during these runs. But then it says nothing about all future states when it is live. One will never know which 10% is bad. Maybe "99% good" is even worse in this kind of setup, because the downside of the 1% "not good" is unbounded. There will always be that gnawing doubt. Can you ever let the AI programmer loose by itself? And if you can't what's the cost of supervising and managing the consequences of its work product?
On the lines of 100x-ing the influence of a mere mortal, I read some interesting and plausible-sounding speculation of the AI powered billion dollar "company of one". Maybe that comes to pass at some point. Software is extreme leverage, after all. However leverage works both ways — "up and to the right", as well as "down and to the negative". The downside is totally unbounded. It can subtract far more than what is created, because that's how shit blows up. A billion dollar company of one will be a wildly complex system of systems, and this entity better price in all that risk, which I doubt, because as posited, the risk is unbounded. I think scale demands more participants precisely because risk dislikes being concentrated. Whenever we force big risks into a small box, the variance of outcomes becomes wildly chaotic. This is just my messy intuition, not my scientifically-informed LLM talking.
In any case, whether the little guy gets superseded or becomes the big guy, shovel-makers and energy owners will make all the money, in this new gold rush (insert "always have" meme). The big tech acronyms already are… ASML, NVIDIA, FAANGs etc. Energy companies too. At the core of it, this particular AI game is one of raw energy and silicon, same as crypto.
Also, I am wary about slick demos that all these well-heeled AI players are publishing. Invariably, much behind the scenes production, cherry picking, and air brushing goes into making a demo that "works". Almost as invariably, slick demos of mouse models fail at delivering the future they promise. This is not for lack of trying, or (maybe) not for lack of genuine intention (e.g. who can tell what OpenAI intends any more). This is just how creativity works.
However sophisticated the output feels, a probabilistic next-token predictor is still exploring a search space. One where the question and the search universe are ill specified, barely constrained, and resist introspection beyond the most cursory.
Given all this positing and conjecture, and the fact that problem solving complexity explodes with small increases in ambiguity of a problem space, my gut says general-purpose programming ability is a literal black hole of energy consumption 3.
The halting problem will permit only a halting AI 4.
If you're a human reading this, please be advised you're by my armchair on the Internet, I failed repeatedly at college math and physics, and I am at best a pedestrian critic. Caveat emptor. Also, I armchair riffed about Tools for Thought previously, in which today's crop of AI tools fall into the "memory assistant" category, which is un-flatteringly the most primitive category of three, which of course speaks to my bias. Caveat emptor too.↩︎
The Devin must be a man, yes? Siri is your secretary and Devin is your programmer.↩︎
I do believe we will likely have lots of agents that are world beating at very narrow and specialised domains. Sort of in line with Chess or Go AIs that reduce grand masters to tears, or WMD systems that reduce everything to rubble, or weather models, or long-haul autonomous transport drones on earmarked routes. And definitely a few colossal monopolies that can expend small-European-country-scale energy required to make AI as a service function, along with political power to keep out of any serious trouble should their AI going very very wrong in very very consequential ways (a billion dollar fine that you can contest forever is nothing if you fetch a hundred billion for your troubles). Some places will definitely be able to price that sort of risk. I'm not wearing a tin foil hat. I just feel history tells us that there will always be someone with capacity to really push the boundaries far past the norm, and the smarts to figure out how to fly through the gaps between choice of actions and ownership of consequences.↩︎
Double-doffing my hat to the Halting Problem, and to Charlie Stross who gave us the notion of the Corporation as a (slow) AI, and who's novel Halting State I thoroughly enjoyed.↩︎