Every January I do a tool audit. What am I actually using versus what I thought I would use? The gap is always larger than expected.

I started 2026 with nine AI-related tools installed across my machine. In May, I am daily-driving four. Here is what survived, what got cut, and why — without the breathless “this tool changed my life” framing that makes these posts useless.

What I kept

Claude Code — $20/month + API usage

The terminal agent is the biggest change to how I work in the last three years. I use it for:

  • Complex refactors that touch multiple files
  • Understanding unfamiliar code I have inherited
  • Writing first drafts of tests for code that has none
  • Anything that requires holding a lot of context at once

The April 2026 desktop app redesign made the multi-session workflow genuinely good. I run two or three sessions simultaneously — one working on a feature, one watching for failures in the test suite, one doing a background security scan on a PR. The Routines feature (currently in research preview) lets me schedule automations without babysitting them.

The cost is real. On heavy weeks with large-codebase refactors, the API usage adds up. I budget around $65/month total when I am using it heavily. For a freelancer or someone billing that time to clients, the ROI calculation is straightforward. For someone at a company that does not cover tools, it is a genuine decision.

The SWE-bench Verified score of 80.8% is the highest of any tool I use. In practice, this means it succeeds on complex tasks that other tools give up on or get wrong.

Cursor — $20/month

Cursor is where I write code. Claude Code is where I delegate. The distinction matters.

Cursor’s autocomplete, powered by Supermaven, has a 72% acceptance rate in my actual usage. That number surprised me when I first heard it — I would have guessed 50-60%. Looking back at my sessions, it is accurate. The completions are fast and contextually correct in a way that makes writing code feel like writing with the grain of the wood rather than against it.

The Cursor 3 parallel agents feature is real and useful for the specific task of coordinating work across independent parts of a codebase. I do not use it daily, but when a task naturally splits into parallel workstreams — building a feature while writing tests while updating documentation — it is genuinely faster.

I am paying $20/month. It is worth it. Whether it is worth $20/month for someone not doing primarily TypeScript/JavaScript work, I genuinely do not know — I have heard more mixed reports from Python-heavy developers.

GitHub Copilot — included in my GitHub plan

I have not paid separately for Copilot since October. My GitHub organisation plan includes it. I would not add it at $10/month if I had to pay separately, because I already have Cursor doing the same thing better. But it is useful in contexts where Cursor does not reach: specifically, the VS Code terminals I have open for quick edits and the occasional JetBrains project I have not moved to Cursor yet.

The Copilot agent mode — take an issue, generate a PR — I use selectively. For well-defined, small issues (updating a dependency, fixing a lint error, updating a date format) it is fast and reliable. For anything requiring judgment, I do not trust it to work unsupervised.

Pieces for Developers — free tier

Pieces is the tool nobody in the “AI stack” discourse talks about, and it quietly does something nothing else does well: it remembers your development context across sessions.

Every code snippet you work with, every conversation you have with an AI tool, gets saved, tagged, and made searchable. Six months later, when you are wondering “did I already solve this problem?”, you can actually find out. The on-device processing means nothing leaves your machine.

It is not glamorous. It does not benchmark well. But I have genuinely found things in Pieces that I had forgotten I did, which saved me from solving the same problem twice.

What I cut

Codeium: Good free tool, but once I was paying for Cursor the autocomplete was redundant. Cut in January.

GitHub Copilot standalone subscription: Rolled into the org plan. Not a fair cut, but it illustrates that these tool stacks have significant overlap — you probably do not need all three completion tools.

Tabnine: Tried the enterprise trial. The collaboration features are interesting for large teams but overkill for solo and small-team work.

Amazon CodeWhisperer (now Q Developer): The AWS integration is genuinely useful if you are deep in the AWS ecosystem. I am not. Cut in March.

Mintlify Doc Writer: I thought I would use AI documentation generation more than I do. Turns out I write documentation when it matters and skip it when it does not, and an AI tool does not change that calculus.

The total cost

Four tools. Roughly $95/month at average usage. That sounds like a lot until you compare it to what it would cost to hire the equivalent hours, or what it costs to not bill those hours because the work took longer than it needed to.

The 340% increase in job postings requiring AI coding tool experience between January 2025 and January 2026 suggests the market has already decided these tools are part of the job. The question is not whether to use them but which ones to use and how.

What I am watching

Windsurf (formerly Codeium’s IDE) is interesting. The Cascade agent handles multi-step tasks well and the pricing is competitive. I am not switching from Cursor, but if Cursor raises prices or the product stagnates, Windsurf is the clearest alternative.

Claude Code Routines — the scheduled automation feature is in research preview and the ceiling on what it could do is genuinely high. Running a nightly codebase health check, automated PR reviews on a schedule, dependency update PRs filed automatically. If it graduates to general availability at a reasonable usage price, it changes how I think about maintenance work.

Open source models for local inference — I have not found a local model that replaces any of the above for serious work. Ollama makes running models locally approachable, and the models are improving. But the gap between local inference quality and cloud inference quality is still significant enough that I am not making the switch on anything I care about getting right.


The stack that works for you will not be identical to mine. The principle that transfers is this: buy the tools you use every day, cut the ones you use occasionally, and be honest about which category each tool actually falls into.