docs/content/en/blog/DevLog-2025.07.18/index.md
Hello, I'm @LemonNeko, one of the maintainers of AIRI.
Half a year ago, I first tried to write an AI Agent that can play the famous automation production simulation game Factorio called airi-factorio, and I did the following things:
/c commands to execute functions registered by the mod. Many thanks to @nekomeowww.tstl output directory to the game directory, so we can directly see the compiled Lua code in the game directory, making debugging easier.This taught me a lot of knowledge (especially that Lua array indices start from 1).
However, I also encountered many problems. Since our main operations were written in the mod, debugging became very troublesome. We needed to exit the map, return to the game's main interface, and re-enter to apply mod changes. If our mod was slightly more complex with data.lua, we needed to restart the game.
We let the LLM generate Lua code, then execute it by calling the game command /c through RCON. However, Factorio has a length limit for each command. If our code was too long, we needed to execute it multiple times.
The current code has poor robustness and maintainability. If new friends want to participate in development, or even just try it out, starting this project is very difficult.
Fast forward to now, I plan to properly organize this project, but I don't know where to start. Coincidentally, someone mentioned a paper called Factorio Learning Environment. Let me give you a simple read-through.
In this paper, the authors proposed a framework called Factorio Learning Environment (FLE), where they tested AI's capabilities in long-term planning, program synthesis, resource management, and spatial reasoning.
FLE has two modes:
They evaluated mainstream LLMs like Claude 3.5 Sonnet, GPT-4o, Deepseek-v3, Gemini-2, etc., but in Lab-play, even the strongest Claude 3.5 at the time only completed 7 levels.
Reading this, I became curious. Their evaluation was so complex, so they must have also ensured technical maintainability. How did they achieve this? Continuing to read, I found that their implementation method was very similar to airi-factorio, but had many advantages compared to airi-factorio:
/sc commands instead of /c commands to execute Lua code, which doesn't output code to the console, keeping the console clean and only leaving the necessary content, simplifying the difficulty of parsing standard input.To better evaluate LLM capabilities, they also carefully analyzed all the required recipe production processes and difficulties, summarizing some formulas, such as the cost of producing an item, how to calculate LLM scores, etc.
They also posted their system prompt, which specifies the environment structure, response format, best practices, how to understand game output, etc.
airi-factorioCompared to FLE, our implementation seems quite naive. So how should we improve airi-factorio?
I don't want to write Python, I'm familiar with TypeScript and Golang, only. Coincidentally, we made mcp-launcher just a few ago, a builder suitable for all possible MCP servers. We can use it with Golang to implement an MCP server, then let the LLM call it.
With that, the structure diagram has changed:
<div class="flex flex-row gap-4"> </div>Player chat content will no longer be sent to the LLM, rather stored in the RconChat mod, while LLM reads this content through the MCP server. With the potential MCP server approach, we don't need to let the LLM generate Lua code anymore.
Regarding system prompts, currently our prompts are AI-generated, but they're still not clear enough, with unclear priorities. I plan to improve them by referencing FLE's system prompt.
Alright, we've basically overturned all the previous designs again. Time to start over.
Thank you for reading. If you're interested, you can read through FLE's paper and code. Maybe my understanding is incorrect; corrections are welcome! This reading might not be deep enough, but when I follow my ideas to improve airi-factorio next, I'll need to read repeatedly and update when there's progress.
That's it for this DevLog. Have a great weekend!
Cover artwork by @anrew10