Claude code usage for mpat2

Recently I announced an updated version of the moteus performance analysis tool. It’s development was unique in that I used the Claude Code agent to do basically the entire project, modulo copying the simulation logic from the original mpat tool. It was my first attempt at using any of the LLM based coding agents and I have a few takeaways:

First, wow. From start of this “rewrite” to writing this post I spent maybe 20 hours of my time total, in 3-4 hour chunks over the course of one week. That is probably 1/4th of the time that was spent on the original mpat for something that is much more complicated. Granted, the actual simulation logic was ported over largely unchanged, so it isn’t a totally fair comparison. Also, I’m still getting a handle on the best ways to interact with CC in this project and I’m sure the best practices will be different for other styles of project.

Second, confabulations are real, especially for UI facing projects. Since the agent was unable to actually run and test the project itself, often it would think it understood a problem, but not actually know what was wrong and instead make up a plausible sounding root cause to fix. In this case, I found a relatively successful strategy of the form: “That didn’t work, add debug prints”, then I would reproduce the problem, paste the console output, and 19/20 times it would fix the issue immediately after. I think there was one instance where it decided on its own to add further Console.logs beyond what it did the first time. I’m guessing for programs that it can execute locally and verify, this would be much less of an issue.

Another thing that makes all the difference is context. When the project was small enough that Claude Code was willing to read the entire thing, all operations were both faster and more successful. Once the file grew beyond the 25k token length that CC imposes on files, then things got much slower as every operation involved lots of grepping through the file to identify functions, their extent, classes and their extent, etc. After operating in a session for a while, speed would return as the model remembered more of the context of the file. Claude Code manages the overall maximum session context length by occasionally “compacting” the existing context, where presumably it asks for a summary of what has happened before and current outstanding tasks. It presents this with a percent marker in the bottom right of the terminal that decreases down to 0% at which point the compression occurs. I certainly came to fear the coming context compression, as after every compression sequence, performance once again dropped significantly. It got to the point when I saw the context was running out, I would switch to executing small backlog tasks that I knew could likely complete without running out.

In this particular case, I had a specific goal of keeping the project in a single file, but I imagine that factoring it into separate files would probably have alleviated this problem a moderate amount, especially for working on problems or features that only touched one aspect. You could imagine that it could have used a lesser model to decide what to read into context and done a better job too. It wouldn’t take a Claude Opus 4 to figure out how to read in an entire class and figure out where its dependencies are. Maybe all these limitations will drift away though as overall context limits grow with later models.

Another issue I ran into was that CC was really willing to duplicate logic, duplicate representations, and create parallel data structures with complex interlocking invariants. In many cases I had to remind it, “no, don’t just copy and paste this entire function just because the original was defined locally, move it somewhere that it can be accessed in both contexts”. Since this was a quick standalone effort, I wasn’t too concerned about the complexity of the data structure invariants, which you can certainly see if you look at the source! Partly it was I suppose just a different way of working for me. Part of me thought, “well, if it gets too bad, I’ll just ask CC to factor things more nicely later”. Granted, the few times I did try that, it wasn’t all that successful without detailed prompting, especially as by that time it was no longer able to keep the entire file in context.

As to the economics, this was all accomplished on the basic $20/mo “Pro” tier without running out of tokens at all, although I averaged maybe only 3-4 hours a day working on it. Given the increase in productiveness, this seems like a runaway hit to me. A $1000/mo tier would likely have still been a big win for this project alone, although for mjbots specifically, dollar amounts in that range make you think at least a bit about it. While I’m sure there are scenarios where Claude Code isn’t the best fit, it seems to me like you’d be seriously hindering yourself if you weren’t willing to pay for it and figure out how to get it to work well in your environment.