Mastering Context Engineering: Best Practices for Reliable AI Performance in 2025

Amit Eyal Govrin

TL;DR
- Context engineering manages the precise information fed into AI models to ensure reliable and accurate outputs beyond basic prompt writing.
- It involves structuring three layers of context: instructional, knowledge, and tool-related data for effective AI performance.
- Key strategies include writing (external memory), selecting (relevant retrieval), compressing (summarization/trimming), and isolating (compartmentalized workflows).
- Proper context engineering prevents issues like misinformation buildup, distraction, confusion, and conflicting data inside AI agents.
- Platforms like Kubiya.ai showcase how disciplined context engineering enables deterministic, scalable, and trustworthy AI agents for complex real-world tasks.
Artificial Intelligence has made incredible strides in recent years, especially with the rise of Large Language Models (LLMs) like GPT, Claude, and Gemini. These models can generate text, answer questions, and even solve problems, but their intelligence is only as good as the information they receive. This is where context engineering comes in: the art and science of filling an agent’s context window with just the right information at each step.
In this article, we explore why context engineering is crucial for unlocking the full potential of AI systems and discuss best practices for implementing it effectively. Context engineering goes beyond simply writing prompts; it involves designing and managing dynamic input environments that integrate user data, interaction history, external APIs, and domain-specific knowledge. This carefully structured context enables AI to deliver accurate, relevant, and reliable outputs while understanding the nuance of user intent and task requirements.
We’ll also explore key strategies for context engineering and how they help Large Language Models (LLMs) generate smarter outputs by filling an agent’s context window with just the right information at each step
[Write] → [Select] → [Compress] → [Isolate] → [Smart LLM Output]
Prompt Engineering: The Starting Point
Before diving into context engineering, it’s helpful to understand prompt engineering. Prompt engineering is about crafting the exact input to get the desired output from an AI model.
Prompt: “Summarize this article in 3 sentences.”AI Output: “The article discusses the rise of renewable energy, highlights key technologies, and predicts global adoption trends.”
Prompt engineering is important, but it focuses mainly on the wording of a single input. Context engineering, on the other hand, looks at the bigger picture of how we feed the AI structured, relevant, and sequenced information over time.
Understanding Context Engineering
As Andrej Karpathy puts it, LLMs are like a new kind of operating system. The model itself is the CPU, and its context window acts like RAM, serving as working memory. Just as a computer has limited RAM, an LLM can only handle a finite amount of information at once. Context engineering is like curating what goes into this “RAM” — ensuring the model has exactly what it needs for the next step. Karpathy describes it as:
“…the delicate art and science of filling the context window with just the right information for the next step.”
Context engineering in LLM applications is about managing the layers of information a model relies on. These layers typically fall into three types:
Instructional context
How we guide the model, including prompts, examples, system messages, and tool/API descriptions.
Knowledge context
What the model knows in the moment, from built-in knowledge to retrieved facts or past memories.
Tool context
Information from external tools such as APIs, code execution, or web searches.
Together, these layers shape the environment in which the model operates. Effective context engineering means orchestrating them so the model delivers accurate and useful results.
Context Engineering for Agents
With the rapid improvements in reasoning and tool use, AI agents have become one of the hottest areas of development this year. Unlike single-shot LLM prompts, agents mix together multiple model calls and tool invocations, often chaining them over long-running tasks. Each tool’s feedback influences the agent’s next step.
But this flexibility comes at a cost: as tasks get longer, agents accumulate more context, leading to heavy token usage. That creates several issues — hitting context window limits, driving up cost and latency, and in some cases even reducing accuracy. As Drew Breunig pointed out, longer context can introduce challenges such as:
Context Poisoning:
This occurs when an error or hallucination incorrect information generated by the model makes its way into the context and then gets repeatedly used by the model. This contaminates the model’s understanding, causing it to fixate on false or irrelevant information, which can lead to poor decision-making or incorrect responses over time.
Context Distraction:
When the context window becomes very large, the model may focus excessively on the accumulated context rather than relying on its learned knowledge. This can cause it to repeat previous actions or get “stuck” in past patterns instead of generating new, creative, or accurate outputs, which degrades overall effectiveness.
Context Confusion:
This happens when the context contains too much irrelevant or superfluous information, confusing the model. Even if the additional data isn’t contradictory, being overwhelmed with unnecessary content can cause the model to misinterpret the prompt or produce less relevant or erroneous results.
Context Clash:
This failure mode arises when different pieces of information within the context directly contradict each other, creating internal conflicts. Such contradictions can confuse the model, making it difficult to generate coherent or logical responses and often causing it to produce inconsistent or erroneous outputs.
These nuanced failures highlight the complexity of managing context effectively in LLM-based systems and the importance of careful context engineering to maintain reliable, high-quality AI performance.
Agent interactions accumulate context over multiple turns. Each step combines system instructions, human input, tool calls, and feedback. As turns progress, this layered context grows, highlighting the need for careful context engineering to manage cost, performance, and accuracy.
With this in mind, Cognition called out the importance of context engineering:
“Context engineering” … is effectively the #1 job of engineers building AI agents
Anthropic also laid it out clearly:
Agents often engage in conversations spanning hundreds of turns, requiring careful context management strategies.
Context Engineering Code Templates
Context engineering benefits greatly from structured templates that help developers build consistent and reusable context inputs for AI models. Here’s an example Python template for dynamic prompt construction:
Python
def build_dynamic_context(user_input, knowledge_base, tool_outputs):
context = f"""
You are an advanced AI assistant.
User input: {user_input}
Relevant Knowledge:
{knowledge_base}
Tool Outputs:
{tool_outputs}
Please provide a detailed and relevant response.
return context
# Example usage:
user_input = "Explain context engineering."
knowledge_base = "Key principles include writing, selecting, compressing, and isolating context."
tool_outputs = "API call results, logs, or code snippets."
prompt = build_dynamic_context(user_input, knowledge_base, tool_outputs)
This template is easily customizable to include various context layers like user data, memory, or external API info. It demonstrates practical context assembly for reliable AI prompting.
The code template and multi-agent context strategies draw inspiration from the open-source “ContextEng Intro”, which provides foundational templates and best practices for AI context workflows. Kubiya.ai builds upon and extends such principles into a robust, enterprise-grade platform for scalable and deterministic AI agent orchestration.
How to Build Dynamic Prompts
- Collect Relevant Data: Gather instructional context (prompts, system messages), knowledge context (documents, facts), and tool context (API responses).
- Structure Input: Use clear sections and labels in the prompt to separate contexts logically.
- Iterate and Refine: Test the prompt output, observe behavior, and iteratively refine prompt phrasing and content chunking.
- Automate Context Retrieval: Implement retrieval techniques like embeddings or keyword search to select relevant past interactions or documents dynamically.
Managing Multi-Agent Context
Complex AI systems often use multiple agents specialized for planning, decision-making, or data gathering. Each agent should have:
- Partial isolated context to avoid confusion and context clash
- Inter-agent communication protocols to share summarized context efficiently
- Context checkpoints to manage token consumption and reuse relevant memory
Example: Agent A fetches web data → summarizes → sends summary to Agent B for decision logic → Agent B executes tool calls based on updated context.
Core Strategies and Challenges for Context Engineering in AI Agents
Agent context engineering relies on four strategies write, select, compress, and isolate commonly used across AI products and research to manage and optimize context.
1. Writing Context:
Persisting Information Beyond Immediate Interaction
Writing context refers to the practice of storing information externally, outside the agent’s immediate context window, so it can be accessed and reused in later steps or sessions. This approach helps manage limited token budgets and preserves important data over time without cluttering the active workspace.
Key Strategies for Writing Context
- Scratchpads:These serve as intermediate working memory where agents keep notes, plans, or calculations. Instead of regenerating or repeating the same information during each step, the agent queries the scratchpad — stored in files, databases, or runtime objects — to retrieve relevant details as needed. This reduces redundant processing and improves efficiency.
- Long-Term Memories:For knowledge that spans multiple interactions or sessions, long-term memories are essential. These can store user preferences, configuration settings, or even reflective insights gathered over time. Persisting this data outside the active context ensures continuity without overwhelming the immediate context window.
Practical Tips
To keep the active context focused, use checkpointing techniques that selectively maintain relevant state while filtering out irrelevant or stale details. By doing so, agents access the right information quickly without the overhead of sifting through noisy data.
2. Selecting Context:
Retrieving What Matters Most
In AI systems and multi-agent workflows, selecting the right context to feed into an agent is critical for accuracy, efficiency, and relevance. Instead of loading all available data, the objective is to retrieve only the most pertinent information for the current task. This reduces token consumption and ensures that agents stay focused on what truly matters.
Techniques for Context Selection
Scratchpads: Working memory that stores intermediate results, exposing only relevant entries at each workflow step to keep context clear and focused.
Memories: Retrieval of episodic, procedural, or semantic data from larger stores like embeddings and knowledge graphs, with filtering and re-ranking to prioritize relevance. Popular agents often use fixed-file sets for easier selection, though dynamic collections like ChatGPT’s user-specific memories require advanced retrieval strategies.
Tools & Knowledge: To avoid confusion from overlapping tool descriptions, limiting active tools and applying Retrieval-Augmented Generation (RAG) improves tool and knowledge snippet selection accuracy by up to threefold.
Challenges in Context Selection
Selecting context from large collections is complex. Techniques like embeddings and knowledge graphs help index memories, but imperfect matches or over-inclusion can lead to unexpected outputs. For example, Simon Willison highlighted how ChatGPT used unrelated personal data in image generation due to overzealous memory retrieval.
In code-heavy tasks, retrieval involves combining embedding searches, semantic chunking, file searches, and knowledge graph ranking. A final re-ranking step then prioritizes the most relevant snippets to fine-tune the agent’s input for better decision-making.
Types of Memory in AI Agents
In AI systems, agents rely on various memory types to retrieve just the right context for each task. The diagram above illustrates the three major forms of agent memory semantic, episodic, and procedural highlighting how each stores different kinds of information:
Memory Type | What is Stored | Human Example | Agent Example |
---|---|---|---|
Semantic | Facts | Things I learned in school | Facts about a user |
Episodic | Experiences | Things I did | Past agent actions |
Procedural | Instructions | Instincts or motor skills | Agent system prompt |
- Semantic memory retains factual knowledge (e.g., facts about a user), just as humans remember things learned in school.
- Episodic memory preserves specific experiences (e.g., past agent actions), much like humans recall events they’ve lived through.
- Procedural memory manages instructions or system prompts, akin to human instincts or skills.
By selectively pulling from these distinct memory forms, agents avoid overwhelming themselves with irrelevant data and focus instead on the most pertinent information needed for their current decision or action. This targeted retrieval is fundamental to accurate, efficient, and scalable AI workflows.
Best Practices for Developers
- Design modular context selection logic to dynamically fetch relevant information per step, minimizing token usage.
- Use layered retrieval combining embeddings, keyword search, knowledge graphs, and heuristic re-ranking for precision.
- Regularly audit retrieved context to filter out irrelevant data and refine ranking methods.
- Limit active tools used per step and apply Retrieval-Augmented Generation (RAG) to prevent overload and confusion.
- Smart context selection enables agents to scale gracefully and perform accurately as memory and tool complexity grows
3 Compressing Context to Optimize Agent Performance
In AI systems dealing with long interactions or large volumes of data, the context window can quickly become overloaded. Compressing context is an essential technique to reduce the amount of information passed into an agent without losing critical details. This not only improves token efficiency but also speeds up processing and helps maintain focus on relevant content.
Key Methods for Context Compression
- Summarization:This involves distilling multi-turn conversations or sizable tool outputs into shorter, concise representations. Hierarchical or recursive summarization approaches break down extensive interactions step-by-step to extract the core messages. For example, summarizing each phase of a task and then combining these summaries preserves important ideas while trimming excess detail.
- Trimming:Unlike summarization, trimming uses heuristic-driven filtering to remove messages or pieces of context deemed outdated or irrelevant. This can include dropping the oldest chat exchanges or pruning based on task-specific rules. Trimming is a lightweight, rule-based complement to summarization that helps keep the context size manageable.
Combining for Best Results
For optimal performance, developers often combine both summarization and trimming strategies. Summarization reduces large chunks of data into informative summaries, while trimming efficiently filters out stale or redundant content. This dual approach balances token usage and speed while ensuring essential knowledge remains accessible.
Practical Applications
Summarization can be integrated at crucial touchpoints in an agent’s workflow — for instance, immediately after a token-heavy tool call like a web search — to condense the results before passing them on. As noted by Cognition.ai, summarizing the handoff between agents significantly reduces the token count during knowledge transfer, a vital step for multi-agent systems operating in resource-constrained environments.
Because accurately capturing important events or decisions through summarization can be challenging, some systems employ fine-tuned models specifically trained to handle this task, underlining how much effort quality summarization can require.
Context Trimming in Action
While summarization generally leverages large language models to distill context, trimming can be a simpler, rule-based process. For example, it might automatically prune older messages or irrelevant data points to keep the context lean. Recent research such as Provence demonstrates trained context pruning specifically tailored for question-answering tasks, showing promising results beyond simple heuristics.
Isolating Context: Keeping Agent Workflows Clean and Efficient
Isolation is a fundamental design principle that splits context into independent compartments to prevent interference, reduce noise, and improve system performance. In complex AI workflows, especially multi-agent systems, isolating context ensures that each component operates on precisely the data it needs without cross-contamination.
Isolation Techniques for Developers
Multi-Agent Systems:Divide tasks across specialized sub-agents, each with its own context window, set of tools, and instruction set. This parallelism allows agents to work independently on focused tasks — such as data gathering, summarization, or decision-making — without overwhelming each other. However, this approach can increase total token usage and requires careful prompt coordination to manage information flow smoothly.
Sandboxed Environments:
Run heavy or potentially risky tool calls and external processes inside sandboxed executions. By doing so, only the necessary outputs — clean and concise — are returned to the main agent. This strategy prevents large or irrelevant data from cluttering the active context, enhances security, and keeps the runtime environment stable.
sandboxed environments isolate code execution from the main AI agent. The agent sends code to run, but only essential outputs like return values or variable names are sent back. Large generated objects — such as images or documents — remain safely inside the sandbox, preventing clutter and protecting the main context. This isolation improves security, reduces noise, and keeps workflows efficient
State Objects:
Structure the runtime state as modular objects with separate fields for messages, tool outputs, and metadata. At each step, expose only the relevant slices of this state to the agent, minimizing context size and helping the agent stay focused on immediate needs. This method simplifies context management and reduces memory bloat.
Why Isolation Matters
By isolating context, developers can:
- Reduce context contamination, avoiding confusing or contradictory inputs
- Improve performance by limiting token consumption and processing overhead
- Maintain agent focus on relevant information, leading to more accurate outputs
- Effective isolation lays the groundwork for scalable, maintainable, and high-performing AI systems that can handle complex or parallel workflows gracefully.
Kubiya.ai Services Supporting Context Engineering
Kubiya.ai offers an enterprise-grade platform that embodies best practices of context engineering by providing:
Live Context Graph:
Real-time ingestion and synchronization of data across tools like AWS, Kubernetes, GitHub, Jira, and Slack. This ensures AI agents have comprehensive, up-to-date environment awareness without token limitations.
Agent Composer:
Enables multi-agent orchestration with specialized, isolated contexts for planning, validation, execution, and decision-making. This avoids context clash and information overload.
Model Context Protocol (MCP):
Embeds live organizational policies, settings, and system states directly into agent context, grounding AI decisions dynamically to reduce errors and enhance trust.
Deterministic Execution:
Kubiya guarantees consistent, repeatable AI outputs for the same inputs, crucial for auditability and production reliability.
Scalable Workflow Automation:
Kubernetes-native architecture supports flexible deployment of AI-powered workflows integrated with existing DevOps and infrastructure tools, accelerating time-to-production.
Self-Service Infrastructure as Code (IaC):
AI-assisted automation for provisioning and managing infrastructure through Terraform, JIRA, and more, embedded within context-rich workflows.
By integrating these services, Kubiya.ai transforms context engineering theory into practical, scalable solutions, helping organizations deploy reliable, production-ready AI agents for complex, real-world automation.
For an in-depth exploration of context engineering best practices and Kubiya.ai’s practical approach to building reliable AI agents, visit:Context Engineering: The Hidden Blueprint Behind Reliable AI Agents in 2025 | Kubiya.ai
This resource delves into disciplined strategies such as the 12-Factor Agent framework, multi-agent orchestration, deterministic workflows, and real-world applications, providing valuable insights for developers looking to master context engineering in 2025.
Conclusion
Mastering context engineering is essential for developers aiming to build reliable, scalable, and high-performing AI agents powered by Large Language Models. By effectively managing how information is structured and fed into an agent—through writing, selecting, compressing, and isolating context—developers can greatly improve AI accuracy, reduce costs, and enable complex multi-agent workflows.
Context engineering goes beyond simple prompt crafting. It requires a disciplined approach to designing dynamic input environments that integrate user data, memories, tools, and interaction history to deliver smarter, more reliable AI outputs. Platforms like Kubiya.ai demonstrate how applying these principles in practice leads to deterministic, trustworthy, and production-ready AI systems capable of sophisticated automation.
As AI continues to evolve, context engineering will remain a fundamental skill to unlock the full potential of AI, ensuring systems not only respond but collaborate intelligently within nuanced and dynamic contexts.
FAQs
What is a context engineer?
Context engineering is the practice of designing dynamic systems that provide AI models with the right information and tools, in the right format and at the right time, to help them effectively complete tasks. It goes beyond just writing prompts by carefully organizing all the context an AI needs to perform well.
What are the skills of context engineering?
Context engineering employs four key strategies to manage the context window effectively: writing, selecting, compressing, and isolating context. Each strategy addresses specific challenges in ensuring the AI has the right information at the right time.
Why is context engineering important?
In conclusion, context engineering has emerged as a cornerstone of building AI systems, especially those powered by LLMs. It complements prompt engineering by focusing on the broader information ecosystem around the model — ensuring the model has the right data, memory, and guidance to perform optimally
What is the goal of using context in prompt engineering?
Providing context and relevant examples within your prompt helps the AI understand the desired task and generate more accurate and relevant outputs. For instance, if you're looking for a creative story, including a few sentences describing the desired tone or theme can significantly improve the results
About the author

Amit Eyal Govrin
Amit oversaw strategic DevOps partnerships at AWS as he repeatedly encountered industry leading DevOps companies struggling with similar pain-points: the Self-Service developer platforms they have created are only as effective as their end user experience. In other words, self-service is not a given.