The local experience of building AI agents is critical for the success of the project. However, at the same time there should be a clear path to a production grade, scalable deployment of both the agents and all the required infrastructure.
k3s on a virtual machine managed by
lima. Continue reading to learn how we got there or jump straight to the solution.Before we dive into solutioning, let’s briefly discuss the requirements.
Agent Stack#
If you ever tried to build an AI agent, you have probably encountered one of the common concerns:
- how to connect and manage credentials to the LLM
- how to process and search through documents from the user
- where to store data and generated artifacts
- how to deploy and share the agent with the world
Our goal is to lift these burdens and let developers focus on the agent logic instead of reinventing the infrastructure wheel. The result of our effort is an open-source AI platform for building, sharing and using AI agents focusing on the local developer experience.
Without diving too much into the details, let’s bluntly state a set of technical requirements that were slowly forming during the development of the Agent Stack:
- a way to download and run packaged agents from the internet
- relational database for persistence of users and domain entities
- object storage for large size documents (e.g., PDFs)
- service for extracting text from the documents
- vector store for embedding and searching the documents
- tracing and observability tools for debugging the agents
- connection to LLMs, including local Ollama models
Cloud Deployment#
For cloud, the choice was pretty straightforward. We’ll use kubernetes for its broad adoption and flexibility, with the typical options for infrastructure: postgres with pgvector for a relational database and vector store, s3 compatible object storage, docling for text extraction, open-telemetry collector for gathering traces coupled with observability tool of choice.
Local stack#
The challenge now is to provide the same experience locally on the desktop. But here is the catch. It must be so reliable that installation works seamlessly with a single command on Mac and Windows so that you can comfortably run a workshop with 100 participants and everyone can install Agent Stack with little or no assistance. This is our main benchmark that helps us to face the reality, and we will see that it’s not that easy.
There are a few ways to achieve this, each with their own set of tradeoffs.
For example, you can use lightweight alternatives such as sqlite, duckdb and store data to user’s home directory,
vector stores can be implemented using in-memory alternatives like chroma. Or you can rely on docker with compose
to set up real postgres.
Dual stack with adapters#
Implementing an abstraction layer over your infrastructure is a good practice regardless of the use-case, if only for
a better testability. We could leverage the adapter pattern and packages that already support multiple backends, for
instance sqlalchemy which supports both sqlite and postgres.
Although this is certainly achievable, it’s not easy to maintain and test dual implementations and backends. Even with sqlalchemy there are many differences between sqlite and postgres like specific data types and query features, and the only way to make sure that the system works on both databases is through automated tests which must include migrations. Some services, such as text extraction or vector store search, could even differ in quality between the cloud and local environments. If you write an agent locally and deploy it to the cloud it should perform consistently regardless of the infrastructure.
When we imagined the test and quality headaches for our small team to manage, we quickly dismissed it.
And there is another problem. How can you run agents downloaded from the internet? At first we started with a simple
npx and uv run commands which both support a public github repository as source. This is the approach that local
MCP servers took at the time, but as you might imagine, it could be quite dangerous to run some code from github on
bare metal. This is what brought us to the next option.
Docker compose#
The great promise of docker? Standardize code execution across different machines and environments. Prepare to enter a rabbit hole.
Developers will agree that docker compose is perhaps the most common way to mimic production infrastructure locally. It works quite well and if you run into some issues and you are familiar with the project you can resolve them quite easily. So naturally, we tried it too, and we run a workshop with tech-savvy participants.
And lo and behold, we spent most of the time debugging strange issues I’ve never even imagined. The days when docker
was free developer-friendly software are over, and for that reason many people are forced to choose an alternative.
There is podman, rancher desktop, colima, each with their own implementation of docker compose, opinions how to
install it or what features are supported.
One example for all, the most important piece of the puzzle—how do you access Ollama models running on the host machine? Have you seen this before?
extra_hosts:
- "host.docker.internal:host-gateway"
This is a piece of configuration you need to add for some compose implementations to recognise the
host.docker.internal hostname which you can use to access the host network. Except for podman which must not have it
otherwise it breaks.
“Just use the host network mode then!” you might say. Well,
this does not work on Mac where the
containers are running in a virtual machine.
Most of all, though, it was simply too difficult to navigate all the nuances, and the resulting installation guide for users was very far from our ideal one-line script. Not knowing any better, can you guess the next thing we tried?
Plain docker#
Disillusioned from the experience of docker compose, we resorted back to plain docker and its more reliable and standardized Docker Engine API.
This improved the install success rate dramatically for our users, and moreover, our agents can now be dockerized instead of bare metal processes, because we can spin up containers dynamically. This was a great step towards improved security. Overall the solution was pretty solid, we had a python proces running natively which would find the docker socket (or display a friendly message to install docker) and then use aiodocker to manage containers for the infrastructure and agents and stream logs.
How did we solve the issue with host.docker.internal? You are not going to believe it, we literally run a small alpine
container during the server startup and if the host was not accessible from it, we added --extra-hosts to the
configuration.
Finally, we had a reliable single infrastructure stack, we can use the same docker containers locally and in the cloud.
At this point you may realize that there is no Docker Engine API in the cloud and we still need to download and run agents dynamically at runtime. Naturally, we can use kubernetes API instead, but that would require a second implementation again. Can we do better?
Kubernetes#
What’s better than running code? Running code in a container in kubernetes in a virtual machine on your laptop!
After so much struggle with local-first approaches, we found it! The ultimate single-stack solution with no compromises, well almost. Let’s break down how it works.
Virtual machine#
Did you know that there is an open-source virtual machine manager for MacOS and linux packaged in a single small binary? It’s called lima and it’s magical. We can use it to completely standardize the deployment environment for the platform. No more dealing with infinite unforeseeable differences between user devices, a virtual machine is a virtual machine. What a time to be alive!
K3s#
This project is just as impressive if not more. Did your intuition also tell you that kubernetes must be quite heavy to run locally? K3s is a certified kubernetes implementation which runs on raspberry pi!
Helm chart#
We found helm to be the most convenient way to define the manifests for the platform and
the infrastructure. Our helm chart contains simple toggles
to switch between self-hosted and cloud postgres or s3, enable oauth authentication for production. Most importantly,
you can use helm install both locally and in production with just a few differences in configuration.
Agent Stack CLI#
Finally we have arrived at the Zen moment. We have a single command to install the platform.
sh -c "$(curl -LsSf https://raw.githubusercontent.com/i-am-bee/agentstack/HEAD/install.sh)"
Well, it’s a script, but I promise we are not cheating! This works on first-try for most people thanks to the final piece of the puzzle, the CLI.
We wrote a python command line app to manage the platform lifecycle, and we baked the limactl binary into it, so we
can start VMs on MacOS and linux with zero extra dependencies! Here are the basic commands for managing the platform:
| Command | Steps |
|---|---|
agentstack platform start (first use) | 1. create ubuntu VM using bundled limactl 2. install k3s3. install helm4. install platform helm chart 5. wait for deployments to start 5. profit |
agentstack platform start (second use) | 1. start existing VM 2. update Agent Stack using helm upgrade |
agentstack platform stop | stop VM |
agentstack platform delete | delete VM |
agentstack self upgrade | 1. upgrade the CLI using uv2. run agentstack platform start (upgrades Agent Stack) |
The installation script does three simple steps:
- enures that uv, a brilliant modern python package manager, is installed
- installs the
agentstack-clipackage usinguv tool installwith--managed-pythonwhich will also download the required python version. - asks the user if they want to run
agentstack platform startimmediately
Admittedly, the process to start the application has quite a few steps, and it can take a while, mostly depending on the speed of your internet connection, but this needs to be done only once.
Conclusion#
Thanks to relying on a carefully defined stack, dependable Ubuntu OS, certified k3s kubernetes implementation with a precisely versioned helm chart, we are able to replicate the environment on any modern machine, 100% of the time.
And what’s even better is that we can manage upgrades and migrations the exact same way locally and in production,
using an industry-standard helm upgrade command.
I believe that what we have achieved is a truly innovative approach to greatly simplify the development of complex
cloud-native applications, especially in small teams. Our package agentstack-cli is specific to the Agent Stack but
with enough imagination the same approach could be generalized to something bigger. Imagine a different take on a
package manager, which would allow you to install, configure and upgrade any cloud-native application with a helm
chart locally from some registry, for example from artifacthub.
Of course, this wouldn’t be possible without the amazing work of the smart people in our team I am really proud to work with. Check out the project and give us a star if you like it!
This article just covered how we install and distribute the application to the users. In an upcoming post, I will describe what tools we use for development of the platform and how to do it natively directly in kubernets with a very similar stack, stay tuned!



