The Weight of Possibility
In traditional software, we learned a long time ago that you shouldn't load every single library into memory the moment an application starts. It kills performance. We use "lazy loading" to fetch resources only when they are actually needed.
Strangely, we forgot this lesson when we started building AI agents.
With the rise of the Model Context Protocol (MCP), it became easy to give an agent access to powerful external systems like web browsers, database connectors, and developer tools. But the standard way to do this is to dump every single tool definition into the agent's "context window" (its active memory) right at the start of the conversation.
This is the equivalent of a plumber walking into a house carrying every single tool they own—wrenches, pipes, saws, welding gear—just to check a leaky faucet. It’s exhausting, expensive, and unnecessary.
The Cost of Being Prepared
The numbers behind this "eager loading" strategy are shocking.
Take a standard browser integration. To give an agent the ability to interact with a website, you might connect a standard Chrome toolkit. This toolkit defines about 26 different actions (tools). Just describing these 26 tools to the AI consumes around 17,000 tokens.
That is roughly 10% of a massive model's capacity, and nearly $0.10 per message on top-tier models, wasted on definitions the agent might never use. Worse, all this noise degrades the agent's reasoning. When the context is crowded with irrelevant tools, the model is more likely to hallucinate or get confused.
The "Skill" Pattern
The solution is to stop treating tools as a global list and start treating them as Skills.
A Skill is a smart wrapper around a set of tools. Instead of forcing the agent to memorize the technical definition of every function in the API, you simply give it a menu.
- The Menu Phase: At the start, the agent is given a short, human-readable description: "Web Browser: Use this to view pages and take screenshots." This costs almost nothing—maybe 30 tokens.
- The Order Phase: The agent converses normally. Only when it decides "I need to see what is on this website" does it "activate" the Skill.
- The Loading Phase: At that precise moment, the system injects the heavy technical definitions of the tools into the context.
This "Just-in-Time" delivery keeps the agent light and fast for 99% of the conversation, only adding weight when it is strictly necessary.