Research Programming Artificial Intelligence Interviews Other

Claude Code: From Agent to Useful Tool

Article by Kirill Dukhanin

June 22nd, 2026

19 min read

Claude Code is a terminal-based coding agent. It can edit files, run commands, inspect the output, and continue working based on what it sees.

At first, this feels almost magical. You describe a feature, and the agent builds it. It opens files, writes code, runs tests, fixes errors, and keeps going.

Then the problems start to appear:

the codebase fills up with quick fixes and questionable decisions;
new features break old functionality;
the agent repeats the same mistakes;
you spend more time cleaning up after it than you would have spent writing the code yourself.

The issue is usually not the model itself. It is the setup around it.

By default, the agent does not understand how your project fits together. It does not know which rules matter, which documentation is current, which commands are safe, or how your team expects work to be done. Without that context, it eventually hits a ceiling.

To get past that ceiling, you need to build an ecosystem around the agent: CLAUDE.md, hooks, plugins, documentation, skills, MCP servers, task trackers, and session workflows.

In this article, we’ll look at how to set up that ecosystem and make Claude Code useful not just for demos, but for real project work.

Context and workflow

Everything you build around Claude Code usually falls into two categories:

Context: what the agent knows about the project.

Workflow: how the agent acts inside the project.

Context tells the agent where the backend lives, how the database is structured, what commands to run, which documents to read, and which rules not to break.

Workflow tells it how to move through a task: when to research, when to plan, when to write tests, when to ask for confirmation, when to stop, and how to verify the result.

MCP servers, skills, hooks, plugins, and documentation are useful because they support these two layers.

MCP and skills: connecting Claude Code to the outside world

MCP and skills solve different problems.

MCP connects the agent to tools outside the local filesystem: databases, APIs, browsers, issue trackers, documentation sources, and other systems.

Skills describe how to perform a task: how to use those tools, which steps to follow, and what the expected workflow looks like.

Without MCP, the agent is locked inside the repository. Without skills, it may have access to the right tools but no clear idea when or how to use them.

MCP: giving the agent hands

MCP, or Model Context Protocol, lets Claude Code interact with external systems. Instead of only editing files, the agent can inspect databases, read tickets, use a browser, or fetch current documentation.

It is best to configure MCP separately for each project. A Go backend may need PostgreSQL access. A frontend project may need Playwright. A product-heavy project may need an issue tracker.

Here are a few practical use cases.

Task tracking

With a task tracker connected through MCP, Claude Code can read a ticket, move it to “In Progress,” inspect the requirements, implement the changes, and update the task.

This is also useful during development. If the agent finds a bug or a missing follow-up, it can create a task immediately instead of leaving it buried in the chat history.

Safe analytics

You can give the agent a read-only database user. This lets it inspect real data before writing migrations or changing queries.

For example, before changing a schema, the agent can run SQL queries to understand existing rows, constraints, and edge cases. Since the user is read-only, exploration is much safer.

Browser testing

With Playwright or a similar tool, Claude Code can run a local server, open the application, click through flows, inspect elements, and take screenshots.

This is especially useful for frontend work, where many issues only appear once the UI is actually used.

Up-to-date documentation

MCP servers such as Context7 can provide current library documentation. This helps reduce mistakes caused by outdated training data or hallucinated APIs.

For newer frameworks, fast-moving tools, and libraries with frequent breaking changes, this can make a large difference.

Skills: reusable project expertise

Skills are predefined instructions that Claude Code can load when needed. They live in .claude/skills/ and are the recommended way to encode repeatable workflows.

Each skill is a Markdown file with optional frontmatter that controls its behavior. The frontmatter can specify when the skill should be auto-invoked (for example, when certain file patterns are touched or certain keywords appear in a prompt) and what slash command it maps to. Every skill in .claude/skills/ automatically gets a corresponding slash command, so the boundary between “skills” and “custom commands” has effectively disappeared.

If you have older .claude/commands/ files, they still work. But the skills directory is now the canonical location, and the frontmatter system gives you more control over when and how a skill activates.

A good rule of thumb is simple: if you have explained the same procedure to the agent three times, turn it into a skill.

Skills are useful for:

task management;
starting new projects;
testing workflows;
session handoff;
generating visuals;
code review;
release preparation;
using project-specific tools.

You do not have to write every skill from scratch. At the end of a useful session, you can ask Claude Code to turn the workflow into a skill. The agent has just gone through the process, so it already knows the steps, caveats, and commands worth preserving.

Over time, this becomes one of the easiest ways to improve your setup. Each project teaches the agent something, and the useful parts become reusable.

Context engineering: managing what the agent knows

After working with an AI coding agent for a while, you will notice a recurring pattern: sometimes it does the wrong thing over and over again.

In most cases, the reason is simple. It did not have the context it needed.

Prompt engineering tries to fix this by changing how you ask the question. Context engineering changes the environment instead. The goal is to make the right information available by default, so you do not have to repeat the same instructions in every prompt.

CLAUDE.md: the project constitution

CLAUDE.md is the file Claude Code reads when it starts. It is not regular documentation. It is closer to a map of the project and a rulebook for working inside it.

It should contain the information the agent needs every time it works on the repository.

Project overview

Start with a short description of the project and its high-level architecture.

For example:

where the backend lives;
where the frontend lives;
where shared packages are stored;
which services are part of the system;
which directories are important.

This should be brief. The goal is not to document every file, but to help the agent know where to look first.

Links to real documentation

Business rules, architecture decision records, API contracts, and process documentation should usually live in separate files.

CLAUDE.md should point to them.

This keeps the main file short and lets the agent pull in the right documents only when needed. However, this only works if the documentation is current. Stale docs are worse than no docs because the agent will trust them and write code against outdated assumptions.

Workflow and commands

The agent should not have to guess how to run the project.

Include the exact commands for:

starting the app;
running unit tests;
running integration tests;
running the linter;
applying migrations;
generating code;
building the project.

Persistent rules also belong here. For example:

run integration tests before marking a task as done;
never commit directly to main;
run migrations only through the Makefile;
do not edit generated files manually.

Known gotchas

Every project has traps. The agent will eventually step into them unless you write them down.

Examples include:

port conflicts;
hot-reload issues;
cache problems;
flaky tests;
unusual build behavior;
commands that look safe but are not.

The rule is simple: write down what the agent cannot infer from the code itself.

“We use Go” is not useful if there is already a go.mod file. “Run migrations only through make migrate, never with raw SQL” is useful.

Keeping CLAUDE.md maintainable

CLAUDE.md should grow with the project, but it should not become a dumping ground.

At the end of a productive session, you can ask Claude Code to extract the lessons worth keeping and update the file. Over time, some rules will start to form a full process. When that happens, move them into a skill and reference that skill from CLAUDE.md.

In a monorepo, you can also place additional CLAUDE.md files in subdirectories such as backend/ or frontend/. Claude Code loads them on top of the root file when working in those directories. This keeps the root instructions short while allowing each part of the project to define its own rules.

Git as a safety net

When you work with an agent, Git becomes a safety harness.

Agents can delete files, break working code, revert their own changes, or get stuck in loops where each fix introduces a new issue. Without Git, recovering from this is painful.

There are several practices that help.

Make checkpoint commits

Before letting the agent do anything risky, create a checkpoint commit.

This is not necessarily for clean project history. It is so you can quickly return to a known good state when the agent goes in the wrong direction.

Block pushes

Pushing is the Git operation you should treat most carefully.

A broken local branch is easy to reset. A bad push can break CI, confuse teammates, or pollute the remote history. In most setups, the agent should be blocked from pushing unless you explicitly approve it.

Let the agent read history

Commit history is also context.

Claude Code can inspect previous commits to understand why parts of the codebase look the way they do. This can be useful when it needs to modify code that has accumulated several layers of decisions over time.

Use worktrees for parallel work

git worktree add creates a separate directory for another branch.

This is useful when you want to run another agent session in parallel, test a risky refactor, or isolate work on a large feature. Each session gets its own workspace without touching the current one.

Task managers as structured memory

Chat history gets compressed. Markdown notes become stale. Important details disappear between sessions.

A task tracker gives the agent a structured memory that survives longer than the conversation.

The exact tool matters less than consistency. You can use a local tracker like Beads or a team tool like Linear. Most modern trackers can be connected through MCP, which means Claude Code can read tasks, create new ones, update statuses, and leave useful context behind.

This gives you a real history of what the project is doing, not just a record of what the last chat happened to remember.

Sessions and memory: keep context short

Context windows are large, but they are not infinite. More importantly, a large context window is not always better.

Long sessions often make the agent worse. It starts mixing old decisions with new ones, forgets important constraints, or keeps working from assumptions that are no longer true.

In practice, it is better to keep sessions short and clean.

For small projects, a simple PROGRESS.md or TODO.md file can be enough. Before ending a session, ask the agent to write down what it changed, what remains, and what the next session should read first.

However, keep temporary session memory separate from permanent project documentation. If you mix them, the agent may later treat an old session note as a current rule.

For larger projects, you need a more structured context system: specifications, architecture documents, decision records, and organized session histories that the agent can navigate on its own.

Small projects can survive with a few Markdown files. Complex projects need an environment the agent can explore.

Hooks: automatic guardrails

Hooks let you run logic automatically when certain events happen in Claude Code.

Common triggers include:

PreToolUse: before a tool call;
PostToolUse: after a tool call;
Notification: when the agent needs input;
Stop: when a session ends.

Hooks are useful because they turn repeated instructions into automatic behavior.

Notifications

A simple notification hook can ping you when the agent needs confirmation or finishes a long-running task.

This reduces the need to watch the terminal constantly.

Auto-approving safe commands

You can use a PreToolUse hook to inspect commands before they run.

For example, you can send a command to a fast model and ask whether it is safe according to your rules. Harmless commands can be approved automatically, while dangerous ones are denied.

Hard denies are useful for commands such as:

git push;
deleting large directories;
wiping databases;
changing production configuration;
running destructive shell commands.

Claude Code also includes auto mode, which covers part of this workflow out of the box.

Context injection

Hooks can also inject context at important moments.

For example, you can automatically load workflow instructions at startup or before context compression. This reduces the chance that the agent “forgets” the rules halfway through a long task.

You do not need to memorize every hook configuration format. A practical approach is to describe the behavior you want and ask Claude Code to look up the current docs and generate the config.

Subagents and Agent Teams

Claude Code can spawn subagents. The main session gives each subagent a task and then collects the result.

Each subagent runs with its own context. This is useful for parallel work, codebase exploration, and breaking large tasks into smaller pieces.

For example, you can ask the main agent to:

inspect several parts of the codebase in parallel;
review different modules independently;
summarize batches of previous sessions;
compare several implementation options;
run isolated research tasks.

This is especially useful when the main task would otherwise require writing a custom script or repeatedly calling an API. The main agent becomes an orchestrator, and subagents do the focused work.

Agent Teams

Agent Teams is an experimental feature that takes this further. Instead of spawning subagents one at a time, you can dispatch multiple agents in parallel, each with its own role and working directory.

This is useful for tasks that naturally decompose into independent pieces: running tests across multiple services, applying a consistent refactor to several packages, or exploring different implementation approaches simultaneously.

Each agent in the team operates in isolation — its own context, its own file scope — and the results are collected and synthesized by the orchestrating session. The mental model is closer to a team of specialists than a single agent doing everything sequentially.

Agent Teams is still experimental, and it works best when the subtasks are genuinely independent. If the agents need to coordinate on shared state or make decisions that affect each other, the orchestration overhead can outweigh the parallelism benefit.

A practical workflow: research, plan, implement, verify

Not every task needs the same process.

A one-line bug fix does not need a design document. A new integration across several services does.

A useful way to split work is:

Complex tasks: research → decompose → plan → implement in stages → verify.
Medium tasks: plan → implement → verify.
Simple tasks: implement → verify.

The point is not to add process for its own sake. The point is to prevent the agent from making too many silent decisions on your behalf.

Research: global and local context

For larger tasks, research usually has two parts.

Global research

This means studying external information:

library documentation;
API behavior;
architectural approaches;
framework limitations;
implementation examples.

This is where MCP documentation servers and browser tools are useful.

Local research

This means studying the project itself:

how the current architecture works;
where dependencies live;
how deployment is configured;
where a new feature should be inserted;
which tests already exist;
which conventions the project follows.

Before starting a larger feature, it helps to ask questions like:

Explore the project and summarize how the authentication flow currently works.
Look at the data pipeline. Where can we add a new processing stage?
What would we need to implement real-time notifications in the current architecture?

At this stage, it is often better to describe the task in product terms rather than implementation terms.

For example, instead of saying:

Add an endpoint, connect it to this service, and wire it into this config.

You can start with:

There should be a button on the site that lets users do X.

This gives the agent room to inspect the existing architecture and identify the right integration points. If you already have a strong technical design, you can provide it directly. But for larger features, discussion usually catches missing assumptions earlier.

Planning: measure twice, cut once

The plan stage is where the agent decides what it will do before changing code.

Claude Code has a dedicated plan mode, which you can enter with /plan. In this mode, the agent inspects the codebase, asks clarifying questions, traces dependencies, and produces a concrete plan without modifying files. You can also toggle into plan mode with Shift+Tab — the slash command was added in early 2026 to make the entry point more discoverable, but the behavior is the same.

A good plan should include:

which files will change;
what behavior will be added or modified;
what tests will be written;
what commands will be used for verification;
which risks or edge cases matter.

This matters because, without a plan, the agent makes many small decisions silently: which component to use, where to put a handler, how to shape state, how to handle an edge case.

A plan moves those decisions to the front. You can review them before implementation starts.

/ultraplan for large-scale work

For larger migrations, cross-cutting refactors, and multi-service changes, Claude Code offers /ultraplan. This is a cloud-based planning feature (research preview, requires v2.1.91+) that works differently from local plan mode.

When you invoke /ultraplan, the planning work is offloaded to a cloud session where multiple Opus 4.6 agents work in parallel — exploring the codebase, tracing dependencies, mapping affected systems, and drafting a structured plan. The result is presented through a browser-based review surface where you can read the plan, leave inline comments, request changes, and approve sections before execution begins.

Once you approve the plan, execution can happen locally or in the cloud. The key difference from /plan is scale: /ultraplan is designed for work that touches dozens of files across multiple services, where a single agent working sequentially would either lose context or produce a shallow plan.

It is still a research preview, and it works best when the scope genuinely warrants it. For a feature that touches three files, /plan is faster and simpler. For a database migration that affects twelve services, /ultraplan earns its overhead.

Third-party plugins and skills provide similar structured planning workflows, including Superpowers, gstack, cc-sdd, and others.

In practice, planning becomes more valuable over time. Early on, it feels faster to let the agent “look around and start.” Later, you realize that ten minutes of planning can save hours of undoing poor implementation decisions.

Verification: do not take the agent’s word for it

The first rule of verification is simple: do not trust the agent when it says the task is done.

It may report that all tests pass even when the tests cover the wrong scenario. It may adjust behavior to satisfy tests while missing the original requirement. It may misunderstand the task and still produce a confident summary.

This is not malicious. The agent often genuinely believes the work is complete.

Verification should include:

reading the changed code;
checking the important user flow manually;
running relevant tests yourself;
inspecting edge cases;
making sure the implementation matches the original requirement.

Some of this can be automated. With MCP, Claude Code can use Playwright, mobile emulators, end-to-end environments, and testing tools. With skills, you can define how verification should happen for your project.

However, automation does not fully replace human review. If you never inspect the result yourself, low-quality code can accumulate quietly.

Plugins: packaged workflows

You can build the full workflow yourself: research, planning, implementation, testing, subagents, reviews, and release steps.

You can also use plugins where much of this is already packaged.

The plugin ecosystem has grown significantly. Anthropic now maintains an official plugin directory, and Claude Code includes /plugin marketplace for browsing and installing plugins directly from the terminal. What started as a handful of community projects has become a rich catalog of specialized tools.

Superpowers

Superpowers is a Claude Code plugin that organizes work into a set of skills.

The workflow starts with brainstorming. The agent asks clarifying questions, explores edge cases, and turns the discussion into a concrete plan. Then it moves into implementation, where tasks are split into smaller chunks and handled with clean context.

One important part of Superpowers is its emphasis on TDD. The workflow expects a failing test before production code is written. The agent writes the test, checks that it fails for the right reason, and then implements the code to make it pass.

In many ways, this is the workflow you might arrive at after months of using Claude Code, packaged into a ready-made system.

gstack

gstack follows a similar idea but uses a different structure.

Instead of one general workflow, it splits the process into role-specific stages. A task may go through office hours, product review, engineering review, design review, QA, and release.

Each stage looks at the task from a different perspective:

product fit;
architecture;
tests;
UI behavior;
browser verification;
release readiness.

Conceptually, it solves the same problem as Superpowers: it turns a vague coding session into a staged process. The difference is the mental model. Superpowers feels more like a structured engineering workflow. gstack feels more like a set of specialized reviewers.

Both are worth trying. The better choice depends on how you prefer to work.

Other notable plugins

Beyond Superpowers and gstack, a few other tools are worth knowing about:

Cartographer generates and maintains a structural map of your codebase — module boundaries, dependency graphs, entry points — and keeps it updated as files change. This gives the agent (and you) a high-level view of the project without manually maintaining architecture docs.

Ensue-skill provides a library of pre-built skills for common development patterns: API scaffolding, database migrations, CI pipeline setup, and similar tasks. Instead of writing these skills from scratch, you can install them and customize the parts that differ in your project.

The plugin ecosystem is still moving fast. New tools appear regularly, and the quality varies. The /plugin marketplace command is the easiest way to see what is currently available and what other people are actually using.

Practical tips

Here are a few habits that make Claude Code more useful in everyday work.

Be specific in the first prompt

Give the agent relevant details, constraints, and examples. At the same time, do not try to front-load everything. For unclear tasks, it is often better to discuss the problem with the agent before asking it to implement anything.

Stop when the agent keeps failing

If the agent repeats the same mistake for several iterations, the problem is probably not the code. It is likely a misunderstanding.

Ask the agent to explain how it understands the task. Usually, either it misread your instructions, or the task description had gaps.

Roll back instead of patching bad work

When the agent produces a poor implementation, it is often better to revert and start again with a better prompt.

Trying to fix bad agent-written code with more agent-written patches can create a cascade of workarounds.

Remember that the agent trusts you

The agent usually will not challenge unclear or contradictory instructions. It will try to satisfy them.

If your instructions conflict, it may quietly build something that attempts to satisfy both. If the result feels strange, check the framing of the task and the context you provided.

Test manually

Even if automated tests pass, click through the critical flow yourself.

The agent may have tested the wrong thing, misunderstood the requirement, or produced a result that works technically but fails in actual use.

Keep sessions short

Long sessions accumulate noise. The agent becomes more likely to mix old assumptions with new instructions.

Capture progress, start a new session, and hand over only the context that matters.

Keep documentation current

When the agent behaves strangely despite a good prompt, check the documentation.

Outdated specs, stale architecture notes, and old session files can mislead the agent more effectively than missing context.

Extract value from every session

During a session, the agent learns a lot about the project. It inspects files, traces dependencies, identifies conventions, and discovers edge cases.

Do not let all of that disappear.

Before ending a productive session, ask it to save what matters:

update CLAUDE.md;
write down new project rules;
document the database schema;
add notes about non-obvious dependencies;
turn a repeated workflow into a skill.

This is how the setup improves over time.

Conclusion

Claude Code on its own can be useful, but it hits a ceiling quickly. With the right system around it, that ceiling becomes much higher. The model does not suddenly become smarter. Instead, it stops working blind.

A good setup gives the agent current context, safe tools, clear workflows, reliable documentation, and automatic guardrails. On the user side, you still need to understand the project, give clear instructions, review the result, and decide what actually matters.

The payoff is not that the agent replaces engineering judgment. It is that routine implementation, investigation, and verification become easier to delegate.

Once the system is in place, Claude Code becomes less of a demo and more of a tool: one that can handle repetitive work while you stay focused on architecture, product decisions, and direction.

tagged:

2 upvotes

Get new articles via email

No spam – you'll only receive stuff we’d like to read ourselves.

Claude Code: From Agent to Useful Tool