The 8 problems that come back in every AI agent project

We have now built more than ten production AI agent systems. Not demos. Not proof-of-concepts that end up in a drawer. Systems that run daily, process real data, and make real decisions.

After all those projects, one thing surprised us the most: the hardest problems are almost never about the AI model.

The model is 25% of the work

There is a persistent belief that AI projects revolve around the model. The right prompt, the right embeddings, the right fine-tuning recipe. And yes, that matters. But it is maybe a quarter of the total work.

The other 75% is engineering. Boring, essential, get-it-right-or-it-breaks engineering. And these eight problems came back in literally every project:

1. Channel routing

Email, Slack, Teams, chat widget, voice — every client has different channels. And every channel has its own quirks. Slack sometimes sends webhook events twice. Email via Microsoft 365 requires clientState validation. Teams has an entirely separate auth flow.

The problem is not "how do I connect a channel." The problem is: how do you build a system where channels are interchangeable, so the same agent logic can run across any channel?

2. Authentication and authorization

Every production API needs auth. Everyone knows that. But with agent systems, it is more complex than with a standard REST API. You need JWT for external endpoints. You need webhook signature validation for incoming messages. You need reviewer identity for human-in-the-loop flows. And you need rate limiting because a hallucinating agent making unlimited API calls is a very expensive mistake.

Most teams build auth as an afterthought. We have learned it needs to be the first layer.

3. Rate limiting and cost control

LLM APIs are expensive. A bug that causes a loop can cost hundreds of euros in an hour. Rate limiting is not optional — it is a safety mechanism. And it should not work per instance but shared, via Redis or a similar store, so that horizontal scaling does not cause an unexpected cost explosion.

4. Structured logging and observability

Console.log is not a logging strategy. When an agent in production gives a wrong answer to a customer, you need to be able to see within five minutes which input came in, which context was retrieved, which model was called, and which output was generated.

That requires structured logging with trace IDs, request correlation, and a way to reconstruct the entire agent pipeline step by step.

5. Deployment and infrastructure

Docker, Kubernetes, Helm charts, CI/CD pipelines, secrets management, health checks that actually verify dependencies instead of just reporting that the process is running. Each of these components is not complicated on its own. But setting them all up correctly for a new project takes time. A lot of time.

And it is the same pattern every time — but just different enough per project that you cannot blindly copy it.

6. Vector store and retrieval plumbing

Everyone builds RAG. Almost nobody builds it well. Retrieving relevant context is not a matter of "throw everything into a vector database and do a similarity search." It is about chunking strategies, metadata filtering, citation tracking, and hybrid search that combines keyword and semantic approaches.

The retrieval pipeline is just as important as the model you choose. Maybe more important.

7. Human-in-the-loop and review flows

100% automation is a fantasy. In every production system, there are decisions that a human needs to validate. But how do you build that? How do you route agent output to a reviewer? How do you track who validated what? How do you ensure the review process does not become the bottleneck?

This is not a feature you "just add later." This is an architecture decision you need to make from the start.

8. Repository structure that scales

Most AI projects start as a single script. Then it becomes a folder of scripts. Then a monorepo that nobody can navigate anymore. After three months, the codebase is unmanageable — not because the code is bad, but because there was no structure that anticipated growth.

How you organize an agent repo — where agents live, how shared logic is shared, how configs and secrets are separated — determines how much pain you will have six months from now.

The conclusion

None of these eight problems is unique to a specific client or industry. They are universal. And they are all infrastructure — not AI.

Yet most AI teams spend the majority of their time on these problems, over and over again, in every project.

We solved them once. In our own platform layer that every project uses as a starting point. Not because we are clever, but because we were tired of making the same mistakes for the eleventh time.