# Why Coding Agents Love Layered Baklava Code

**Blog:** [vschroeder.blog](https://vschroeder.blog)  
**Author:** Victor Schroeder  
**Published:** 2026-05-17  
**Tags:** [ai](/tags/ai.md), [software-engineering](/tags/software-engineering.md), [architecture](/tags/architecture.md), [practices](/tags/practices.md)

> AI coding agents love wrapping everything into god-objects. Layered architecture fights that instinct at the structural level. Your code gets smaller files, better tests, and agents that follow the pattern instead of inventing their own.


[View as HTML](/posts/20260517-why-coding-agents-love-layered-baklava-code/)

---

Two developers walk into a bar, I mean, in a bi-weekly retrospective:

> **Vibe Coder:** "Dude, the first two weeks were _insane_. I was shipping
> features like a 10x machine. Claude just _got it_. I would describe a feature
> and boom, working code. I felt unstoppable. Unless for when I ran out of
> tokens, of course."

> **Software Engineer:** "Yeah, yeah... why do you have that face then?"

> **Vibe Coder:** "I really don't know what happened. At some point everything
> started slowing down. I mean, A LOT. Every PR is a 2,000-line diff that
> touches half the codebase. I broke the freaking payment flow last Tuesday by
> adding a notification feature, WTF! Don't ask me how, I have no idea. The
> tests pass sometimes. I stopped reading the code honestly, I just follow the
> vibes and hope for the best. Why is AI acting so stupid lately? Must be that
> new model!"

> **Software Engineer:** _(chuckles softly)_ "Come'on dude, don't blame AI, it's
> doing wonders in my project. The new model is even better!"

> **Vibe Coder:** "What? How come?"

> **Software Engineer:** "Take a look here. My PRs are, what? 200-300 lines
> maybe? I shipped that new API integration last week, four files changed. When
> the bug showed up on Thursday, my AI assistant found it in two minutes because
> it was an obvious validation issue with one of the models, going from a
> handler to the service layer."

> **Vibe Coder:** "Model? Service _layer_? What are you even talking about?"

> **Software Engineer:** "Sit down and grab some baklava. I'll explain."

---

If you feel like our friend _Vibe Coder_ from the dialogue above, you should
take a few minutes and dig into my previous post, where I wrote about the
[baklava architecture](/posts/20260516-baklava-architecture-your-python-app-needs-layers/).

A TL;DR version is: how layered application architecture gives you testability,
flexibility, and the ability to reason about your code. It doesn't matter the
size of the app or who writes the code.

But here is an anecdotal twist, I did not anticipate. When I first adopted this
pattern, it was eons before any AI assistant became available. Now that they are
everywhere, one side-effect of clean architecture is crystal clear: layered
codebases are dramatically easier for AI agents to work with.

Not by accident, but structural consequence. And once you see why, you will
never want to go back to a flat codebase when working with AI.

## The wrapping problem

Every AI coding agent I have used, Claude, GPT, ~Copilot~, Cursor, all of them,
are just way too eager to start coding. They all share the same default
instinct: achieve the goal from the prompt as fast as possible. Give it a task
and they will create helper functions, classes, utility modules, wrappers, some
abstractions. It's what I call "semi-structured" code. It kinda makes sense in
the context of that diff, but does it fit with the rest of the codebase?

This is not a bug in one specific model. It is a pattern that emerges from
training on millions of repositories solving the same problems in wildly
different ways, with varying degree of (questionable) quality.

The models learned how to code and the teachers were random people from the
internet. _What could possibly go wrong?_

Pet projects, code without tests, not a faint distant shade of good practices.
When the code is good, it belongs to mega OSS libraries that are hyper-flexible
to cover different architectures and hundreds of use-cases, which are not really
suitable for app development. Most important, the code used for training was
usually not supporting any real business, under any strict SLAs nor real-life
constraints or maintainability concerns.

I'm not saying that all the code publicly available is bad, but maybe 90% is,
sorry folks. They are simply really bad!

And while we have many excellent examples of OSS libraries, well structured and
covered by tests, the same is not true for complete applications. The design
principles of a library and an application are very different.

Without a clear architectural skeleton, the agent has no guidance on _where_
things should go. It invents its own organization. Every time. And every time it
invents something slightly different. Monday's task gets a `UserManager`.
Tuesday's gets a `UserHelper`. Wednesday's introduces a `UserFacade`. All of
them looked fine on the PR diff, but by the end of the week you have five
overlapping abstractions that nobody asked for.

Sound familiar? This is what happens in human-maintained codebases too, just
slower. AI agents do it at 10x speed or even faster, if you are just "following
vibes".

## Structure as a constraint

Here is what changes with a layered architecture: the agent does not need to
_decide_ where things go. The structure already tells it.

When you have a `handlers/` directory, a `services/` directory, a
`repositories/` directory, and clear examples in each, the agent pattern-matches
against the existing code. It sees that handlers are thin, that services contain
business logic, that repositories talk to the database. It follows the
convention because the convention is _right there in the codebase_.

```
app/
├── handlers/
│   └── http/
│       └── user_routes.py      ← thin, delegates to service
├── services/
│   └── user_service.py         ← business logic, DI
├── repositories/
│   └── user_repository.py      ← DB operations only
└── models/
    └── user.py                 ← data structures
```

Give an AI agent this structure plus a task like "add email verification to user
registration" and it will, almost always:

1. Add a method to `UserService`
2. Maybe add a repository method if needed
3. Update the handler to pass the new parameter
4. Create or update a model if the schema changes

That is exactly what you or I would do. The structure constrains the agent into
the correct behavior. No new abstractions. No invented patterns. Just following
what is already there.

**AI agents are excellent pattern followers but terrible pattern inventors.**
Give them a clear pattern and they will replicate it faithfully. Give them a
blank canvas and they will paint something... _creative_.

## The guidelines bootstrap

A directory structure alone is not enough. You also need a few lines of explicit
guidance. A good `README.md` or `AGENTS.md` (or equivalent instructions file for
whatever tool you use) that says something like:

```markdown
## Architecture

- Handlers: receive input, validate, delegate to services, format output
- Services: business logic, orchestration, DI via constructor
- Repositories: data access only, no business rules
- Never import from handlers into services
- Never put DB queries in handlers
```

That is about 200 tokens of context. Tiny, but you can certainly include more
details and refine the rules. Combined with the existing code as reference, it
gives the agent enough constraint to produce code that fits. I have been doing
this for months now and the results are remarkably consistent.

Even when the agent deviates (and it will, occasionally), the deviation is
_visible_. A service importing from a handler? That stands out like a broken
window in your imports. The dependency rule makes violations obvious. You catch
them in seconds during review.

## The reviewer catches the drift

This is where it gets really fun: specialized agents. I have been running a
pattern where a few agents code and others review. The coding agents do the
work. The reviewers have the same architectural guidelines and focus on
structural violations.

When a coding agent introduces something that breaks the layering, say, a
database call inside a handler, or a service that formats HTTP responses, the
reviewer catches it immediately. Not because it is smarter, but because the
rules are simple and unambiguous. Binary. Either the dependency points inward or
it does not.

```
                                              approve
                                            ↗         ↘
Coding Agents → produces code → Review Agents        human review
      ↑                                     ↘
      │                                      reject
      │                                         ↓
      │                          "Feedback: UserService imports from
      │                           handlers.http. This violates the
      │                           dependency rule. Services must not
      │                           depend on handlers. Changes rejected."
      └──── autoprompt to fix ──────────────────┘
```

Think about what would happen without clear boundaries. What would the reviewer
check against? "Does this code feel well-organized"? That is subjective. "Does
this import violate the dependency graph"? That is objective. The reviewer can
enforce it mechanically.

Multi-agent coding is something I have been exploring with great success lately
(it definitely deserves a separate post, or a whole series). The point here is
that layered architecture makes multi-agent workflows _tractable_. Without clear
rules, the reviewer has nothing concrete to enforce and would be just another
element to drift.

## Smaller files, better attention

There is a deeper, more technical reason why layered code works better with AI.
It produces smaller files.

A flat Flask application might have a single `routes.py` with 2,000 lines. The
layered equivalent splits that into maybe fifteen files averaging 80 to 150
lines each. Same total code, but each file is focused on one thing.

Why does this matter? Because of how transformer attention actually works.
Research has shown that LLMs perform significantly worse when relevant
information is buried in the middle of a long context.

The paper
["Lost in the Middle"](https://cs.stanford.edu/~nfliu/papers/lost-in-the-middle.arxiv2023.pdf)
(Liu et al., 2024) demonstrated that models perform best when relevant content
appears at the beginning or end of the input, with accuracy dropping up to 20-30
percentage points for content positioned in the middle of the context window.

Think about that applied to code. When an AI agent reads a 2,000-line file to
understand the `create_user` logic buried at line 847, it is fighting against
its own architecture. The attention mechanism gives less weight to tokens in the
middle. The model literally _notices_ the beginning and end of the file more
than the center.

[MindStudio's analysis](https://www.mindstudio.ai/blog/context-rot-ai-coding-agents-explained)
of this effect on code specifically found that extraction accuracy was around
89% when relevant code appeared in the first 20% or last 15% of the context,
dropping to roughly 61% when positioned in the middle range. That is not a
subtle difference. That is the difference between the agent understanding your
code and hallucinating something _plausible_ but completely wrong.

With layered architecture, the agent rarely faces this problem. It reads
`user_service.py` (120 lines), finds `create` at line 15, and has full attention
on the relevant logic. The file _is_ the context. No noise, no unrelated
functions competing for attention tokens.

Small, focused files are not just "nice to have." They are structurally better
for how these models process information.

## The testing advantage

I covered this extensively in the
["Baklava Architecture"](/posts/20260516-baklava-architecture-your-python-app-needs-layers/)
post, but it is worth repeating in the AI context: layered code is dramatically
easier to test.

When you ask an AI agent to write tests for a service with injected
dependencies, it produces exactly the kind of focused unit tests you want:

```python
async def test_create_user_sends_verification_email():
    user_repo = FakeUserRepository()
    email_client = FakeEmailClient()
    service = UserService(user_repo, email_client)

    await service.create(CreateUserRequest(
        name="Geordi", email="geordi@enterprise.fed"
    ))

    assert email_client.sent[-1].to == "geordi@enterprise.fed"
    assert "verify" in email_client.sent[-1].subject.lower()
```

No HTTP server. No database. No Docker containers. The test is fast,
deterministic, and focused on one behavior. AI agents are _excellent_ at
generating these because the pattern is dead simple: inject fakes, call method,
assert result. Even a junior model can get this right.

Now compare that to testing a fat route handler that does everything. The agent
has to spin up a test client, somehow mock the database, deal with
authentication middleware, handle response parsing. The test becomes complex,
brittle, and often just wrong. Who is going to debug it? You? How will you even
notice it is wrong? We are talking about test cases sometimes longer and more
complicated than the implementation _itself_!

The layered version gives you better coverage with simpler tests. And since the
tests are simpler, the AI agent writes them correctly more often. It is a
positive feedback loop: good structure produces good tests, good tests catch bad
code, bad code gets rejected, structure stays clean.

## The size paradox

Here is something counterintuitive. A layered codebase has _more_ files and
_more_ directory structure than a flat one. At first glance, it looks bigger.
More boilerplate. More ceremony. More scrolling in the file tree.

But the total lines of code? Actually fewer.

Because layering forces you to think about what each piece does. No duplication
across handlers because the logic lives in the service. No copy-pasted queries
because the repository encapsulates them. No utility functions scattered in
twelve different files because they have one canonical home.

Structure makes redundancy visible. When you see two services doing the same
thing, you notice. In a flat codebase, the same duplication hides in 2,000-line
files where nobody scrolls past the first hundred lines. It is invisible until
it causes a bug. And then it causes two, because you fixed one copy and forgot
the other.

More directories, fewer lines. More files, less code per file. More structure,
less total complexity. It only looks like a paradox if you measure complexity by
counting files in the tree. Something that is in the same category of measuring
productivity by LOC.

## "But that's too much boilerplate!"

No it is not. A monolithic file is not _simpler_, it is just shorter and denser
pile of mess in a single place. The complexity is there, compressed into fewer
files where it is harder to find and harder to reason about. A 120-line service
with one responsibility is genuinely simpler than a 2,000-line router where that
same logic hides at line 847 between unrelated functions. "Too many
indirections", "too many files", same spirit. They trade consistency for the
immediate term dopamine reward of not having to think. You are not being fast,
you are not being productive. You are being lazy and hoping for the best.

Nobody reads a 2,000-line file top to bottom. You search, you scroll, you lose
your place. With layers, you open `user_service.py`, find what you need in
seconds. Done.

The one objection that deserves a real answer is "slows down prototyping." And
it is _almost_ true. Your first day is a bit slower. You create more files. You
think about where things go before you write them. That takes ten extra minutes.

But then week two happens. And week three. And you never spend thirty minutes
debugging a test failure caused by some unrelated function sharing the same
module. You never spend an hour tracing a bug through a 500-line handler that
validates, queries, decides, formats, and logs all in one breath. Your _initial
velocity_ costs you maybe 5%. Your _sustained velocity_ is 10x faster, because
you never hit the wall that our friend Vibe Coder crashed into at week three.

The real overhead is in _not_ layering: the debugging sessions, the duplicated
logic, the tests that need full infrastructure, the fear of touching a file
because everything depends on everything else.

## Your codebase is your best prompt

People obsess over prompt engineering. They craft these elaborate system
prompts, tweak the temperature, try different models. And sure, that stuff
matters a little. But the single most influential thing your AI agent reads is
not your prompt. It is your code.

That is the real takeaway. Your `AGENTS.md` is a pamphlet. Your codebase is the
textbook. The agent will mimic what it sees, not what you told it to do in 200
tokens of instructions. If what it sees is layered, focused, and consistent,
that is what it produces. If what it sees is a swamp, well, you get more swamp.

Will it be perfect? No. Nothing is perfect with probabilistic systems. But it
will be _consistent enough_ that you spend your time directing instead of
correcting.

---

> **Vibe Coder:** "So you're saying the reason my AI acting dumber is not the
> model, it's my codebase?"

> **Software Engineer:** "Yes, 100%! What's the surprise? You are basically
> swimming in a spaghetti pool, man. It was bad in codebases maintained by
> humans, it will be the same, actually worse, with AI. The model you use is the
> same one I'm using. Same capabilities, same intelligence. But mine has
> guardrails. Yours has a blank canvas and a loaded paintball gun."

> **Vibe Coder:** "And the guardrails are just... directories? And a few rules
> in a file?"

> **Software Engineer:** "Directories, a few rules, yes, but also a ruthless
> pair of engineer eyes that don't accept slop. And, most important, I provide
> _examples_. The agent reads your existing code to figure out how to write new
> code. If your existing code is a mess, the new code will be a mess too.
> Garbage in, garbage out. It has always been like that."

> **Vibe Coder:** "So what do I do now? My app is already..."

> **Software Engineer:** "A disaster? A complete mess? Yeah. But look, maybe it
> is not too late. Refactoring messy applications is something I did a few times
> in the past and can also be nicely done with AI assistance. You just need a
> clear target, semi-mechanical processes and a lot of patience. But you
> definitely need to stop "following vibes" and get your sh\*t together. Give a
> proper blueprint to your agent and..."

> **Vibe Coder:** _(stares at baklava)_ "Wait, are these pistachios?!"

> **Software Engineer:** _(facepalm)_

---

Previous: [Baklava Architecture: Your Python App Needs Layers](/posts/20260516-baklava-architecture-your-python-app-needs-layers.md)  
