content/brief-history-of-software.md
Whether you're new to agents or an ornery old veteran like me, I'm going to try to convince you to throw out most of what you think about AI Agents, take a step back, and rethink them from first principles. (spoiler alert if you didn't catch the OpenAI responses launch a few weeks back, but pushing MORE agent logic behind an API ain't it)
let's talk about how we got here
We're gonna talk a lot about Directed Graphs (DGs) and their Acyclic friends, DAGs. I'll start by pointing out that...well...software is a directed graph. There's a reason we used to represent programs as flow charts.
Around 20 years ago, we started to see DAG orchestrators become popular. We're talking classics like Airflow, Prefect, some predecessors, and some newer ones like (dagster, inggest, windmill). These followed the same graph pattern, with the added benefit of observability, modularity, retries, administration, etc.
When ML models started to get good enough to be useful, we started to see DAGs with ML models sprinkled in. You might imagine steps like "summarize the text in this column into a new column" or "classify the support issues by severity or sentiment".
But at the end of the day, it's still mostly the same good old deterministic software.
I'm not the first person to say this, but my biggest takeaway when I started learning about agents, was that you get to throw the DAG away. Instead of software engineers coding each step and edge case, you can give the agent a goal and a set of transitions:
And let the LLM make decisions in real time to figure out the path
The promise here is that you write less software, you just give the LLM the "edges" of the graph and let it figure out the nodes. You can recover from errors, you can write less code, and you may find that LLMs find novel solutions to problems.
Put another way, you've got this loop consisting of 3 steps:
initial_event = {"message": "..."}
context = [initial_event]
while True:
next_step = await llm.determine_next_step(context)
context.append(next_step)
if (next_step.intent === "done"):
return next_step.final_answer
result = await execute_step(next_step)
context.append(result)
Our initial context is just the starting event (maybe a user message, maybe a cron fired, maybe a webhook, etc), and we ask the llm to choose the next step (tool) or to determine that we're done.
Here's a multi-step example:
<details> <summary><a href="https://github.com/humanlayer/12-factor-agents/blob/main/img/027-agent-loop-animation.gif">GIF Version</a></summary> </details>And the "materialized" DAG that was generated would look something like:
The biggest problems with this pattern:
Even if you haven't hand-rolled an agent, you've probably seen this long-context problem in working with agentic coding tools. They just get lost after a while and you need to start a new chat.
I'll even perhaps posit something I've heard in passing quite a bit, and that YOU probably have developed your own intuition around:
Even as models support longer and longer context windows, you'll ALWAYS get better results with a small, focused prompt and context
Most builders I've talked to pushed the "tool calling loop" idea to the side when they realized that anything more than 10-20 turns becomes a big mess that the LLM can't recover from. Even if the agent gets it right 90% of the time, that's miles away from "good enough to put in customer hands". Can you imagine a web app that crashed on 10% of page loads?
Update 2025-06-09 - I really like how @swyx put this:
<a href="https://x.com/swyx/status/1932125643384455237"></a>
One thing that I have seen in the wild quite a bit is taking the agent pattern and sprinkling it into a broader more deterministic DAG.
You might be asking - "why use agents at all in this case?" - we'll get into that shortly, but basically, having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback, translating it into workflow steps without spinning out into context error loops. (factor 1, factor 3 factor 7).
having language models managing well-scoped sets of tasks makes it easy to incorporate live human feedback...without spinning out into context error loops
Here's an example of how deterministic code might run one micro agent responsible for handling the human-in-the-loop steps for deployment.
deploy_frontend_to_prod(4af9ec0)deploy_backend_to_prod(4af9ec0)deploy_frontend_to_prod(4af9ec0)This example is based on a real life OSS agent we've shipped to manage our deployments at Humanlayer - here is a real conversation I had with it last week:
We haven't given this agent a huge pile of tools or tasks. The primary value in the LLM is parsing the human's plaintext feedback and proposing an updated course of action. We isolate tasks and contexts as much as possible to keep the LLM focused on a small, 5-10 step workflow.
Here's another more classic support / chatbot demo.
In the "deploybot" example, we gain a couple benefits from owning the control flow and context accumulation:
Part II will formalize these patterns so they can be applied to add impressive AI features to any software project, without needing to go all in on conventional implementations/definitions of "AI agent".