AI won’t generate working software for you

Hi, the title is actually my current thought about AI code generation tools. For some time I intensively tested Claude Code. I also tested others, but Anthropic created the best one in my opinion. I also spent 10 dollars for a tool that didn’t even run, but I don’t want to point out the exact name. TLDR would be “Code generation tools are not tools that create software for you but a support like a junior developer, but faster and more accurate”. So AI won’t generate a code for you. Yet!

Explanation

It’s not like I am an opponent of AI and the guy who says that AI is not going to take some jobs. On the contrary, I think that only people using AI tools in their job will survive on the market. I don’t want to make predictions about when the companies will say, “We don’t need programmers anymore” because no one knows, but I believe that people constantly using and testing new tools, will be safe. Maybe in the future we won’t need programmers, but positions like “AI operator” or something like this. Now is a good time for getting acquainted with the AI tools (not only code generators) because perhaps now they are not perfect and autonomous, and AI won’t generate code without assistance, but the future can bring a “surprise” for some of us.

What a wrong approach to using Claude Code

As I don’t like to talk too much and create big introductions, I will go straight to the point.

AI won’t generate the code for now itself if the instructions are too poor, not exact and there are no examples explaining your style.

I always used Claude code for simple tasks like reading the code to help me understand, fixing some small bugs, or generating the shitty code that I don’t like to write on my own (DTOs, mappers, entities, etc.). A few days ago I wanted to check if Claude Code can be autonomous. I prepared a documentation in Markdown of the example project with frontend and backend. I wrote exactly what I wanted to (in fact, not really exact, as I found out later on). I created an empty project, one for the backend and one for the frontend, and placed one example old project as an example of how to do the stuff. Furthermore, I even created an MCP server with programming rules, that Claude code should use. My MCP contains rules for backend, frontend, DB, architecture, and configuration. I placed there rules that I learned during my 13 years of development career.

What could go wrong if I was so well prepared?

Actually, everything. AI generation tools are not as smart as we may think. Actually, those tools are not smart at all. They are just statistical models, and they don’t have memory and often lose the context. I finished with a non-working system that sometimes had serious compilation errors that even a beginner developer wouldn’t make. Furthermore, Claude code didn’t follow rules from my MCP server even if the status was “connected”.

Where did I make a mistake?

The rule of thumb would be “Don’t be too general; be concrete”. My instruction was too general. I expected from AI that it would understand my poorly defined expectations, would be creative, and would follow the best programming rules ever known. This expectation was too broad for such a tool. Now I know that for now AI needs clear and concise instructions, like the junior dev who just does what you tell him. When you expect the creativity, I will just copy the code from the internet. But as we know, a lot of code published on the internet is not actual or just shitty. Additionally, Claude, after some iterations, forgets to follow my rules 🙁

How I am using it from now on

Below is the list of things I have learned until now. I think I will be extending the list constantly.

Foremost, act like an architect. Clearly formulate what has to be done, and define exact expectations regarding the architecture, tools used, libraries, etc. The tasks should be concise and follow the rule “one task, one change, build green afterwards”.

Below the list I have created (with the assistance of ChatGPT)

Define a strict execution contract for Claude (what it may and may not do) and repeat it at the start of every task
Split work into minimal, isolated tasks (1 feature or 1 class per task)
Never ask Claude to scaffold frontend and backend together in one step
Add a mandatory verification step after each task (compile, typecheck, tests)
Require Claude to list affected files before writing code
Force Claude to reuse existing code patterns explicitly (point to concrete classes, packages, commits)
Add a rule: no new abstractions unless explicitly requested
Add a rule: no guessing of configuration values or dependencies
Enforce one commit per task with a clear commit message template
Require Claude to run build commands mentally and list expected errors
Add a rule: do not invent APIs or DTOs without existing references
Introduce a “stop condition” where Claude must stop and ask if assumptions are unclear
Make MCP rules task-scoped, not global (frontend-only or backend-only per task)
Add a rule: Claude must restate which MCP rules it will follow before coding
Add a rule: reject code if it violates naming or package structure
Require compilation-first development (empty classes before logic)
Delay integrations (Stripe, Mailgun, OpenAI) until the core domain compiles
Add a validation checklist Claude must pass before marking a task FINISHED
Add a rule: no copy-paste from example project without adaptation explanation
Create a “KNOWN CONSTRAINTS.md” that Claude must always read first
Create a “DO NOT TOUCH.md” for sensitive or stable parts of the codebase
Require Claude to output only diffs or file contents, never explanations, during coding
Add a final rule: if unsure, generate TODO instead of code

Summary

To summarize, I am not going to rephrase what I wrote above. I would like to add that in the current state of the AI generation tools, AI won’t generate the code itself, but with the assistance of a human, it works well. The second approach that, in my opinion, can work well is pipelines for code generation that generate the code but constantly run builds and create a feedback loop. In this construct, the quality of the tests should be high.

One more thing for people who are afraid of AI taking their job. This is realistic scenario for some of us, but you can decide if this is a threat or an opportunity. A few years ago I saw a movie, https://en.wikipedia.org/wiki/Hidden_Figures. There is one important lesson behind the story, but I will leave it for you to build your own conclusions.

Read my previous posts as well.