Skip to main content
← Back to Blog

Three Ways to Poison Your Agent's Context Window

March 25, 2026 · Matt McKenna

Three Ways to Poison Your Agent's Context Window

We audited Surchin's retrieval pipeline and found three failure modes that apply to anyone injecting context into LLM workflows.


The conversation about AI coding assistants used to focus almost entirely on model capability. Can it write a correct SQL join? Does it understand Rust lifetimes? That framing made sense when context windows were small and models were unreliable — the bottleneck was the model itself.

That bottleneck has moved. Today's frontier models are competent at most coding tasks. The new constraint is what you put in front of them.

From Prompt Engineering to Context Engineering

Prompt engineering is demand-side: you control what the model is asked to do. Context engineering is supply-side: tool builders control what information the model has access to when answering. These are different problems with different failure modes.

In agent workflows, context injection happens constantly and invisibly. Every query_insights call in Surchin, for example, assembles and injects a structured block into the agent's context window before it reads a single file. A typical call returns something in the range of 6,500–7,500 characters: user preferences, applicable skills with their full instruction sets, ranked insight results, and system notices. The agent reads all of it. Most of it is relevant. Some of it, as we discovered during a recent audit, is actively harmful.

This is the context engineering problem: you are making editorial decisions on behalf of the model, at scale, without realizing it. Every design choice in your retrieval pipeline — similarity thresholds, scoring weights, truncation strategy, feedback loop mechanics — determines what the model knows and doesn't know when it makes decisions. Get it wrong and you don't get a wrong answer. You get a confident wrong answer, delivered without any indication that the supporting evidence was stale, irrelevant, or contradictory.

The Three Failure Modes

We found three distinct ways a retrieval pipeline can degrade context quality. They are not always obvious in outputs — that's what makes them dangerous.

Failure modeDefinitionSymptomExample
PoisoningStale or incorrect content that was once accurateModel applies advice that no longer applies to the current codebaseInsight about getUser() persists after the function was renamed to authenticateUser()
DistractionIrrelevant content that displaces relevant contentModel hedges or loses focus on the actual problemKotlin release workflow skill injected during TypeScript context engineering work
ClashingContradictory content about the same topicModel flip-flops or produces internally inconsistent outputSOLUTION insight: "wrap in try-catch"; PITFALL insight: "never swallow this error with try-catch"

Each failure mode has a different root cause, enters the pipeline at a different stage, and requires a different fix. We'll map them in a moment.

Poisoning is the insidious one. Surchin's promoted insights — those with the most positive signals — decay at a rate of 0.0005 per hour, which corresponds to a roughly 58-day half-life. That's by design: stability matters, and we don't want short-lived signals to cause promoted insights to evaporate. But it means an insight deposited about a specific file path or function name can persist at high strength for two months after the code it describes has been refactored away. The model has no way to know the reference is stale — it just sees a high-strength, promoted insight saying something authoritative about a file that no longer looks like that.

Distraction doesn't corrupt the model's reasoning — it crowds out useful content. Context windows are finite, and every token consumed by irrelevant material is a token unavailable to relevant material. Skills are particularly high-risk: a single skill can inject 2,000–3,000 characters of structured instructions. If the similarity threshold for skill retrieval is too permissive, you get full instruction blocks injected for loosely related topics. During our audit, a dejavu-release skill (a Kotlin/Android library release workflow) was being served to Claude Code sessions working on TypeScript web code. The semantic similarity was above threshold; the operational relevance was zero.

Clashing is the hardest to detect because it requires cross-insight awareness. Individual insights can be high quality in isolation — accurate, well-written, correctly tagged — but contradict each other when served together. This most commonly happens when two different agents, working in different contexts, deposit conflicting solutions to the same underlying error. One session's fix is another session's pitfall. Without contradiction detection at retrieval time, both get served.

Mapping the Pipeline

The retrieval pipeline in Surchin runs in roughly this order:

Query
  │
  ├─► Pool 1: Error signature exact match   (up to 50 candidates)
  ├─► Pool 2: Vector similarity search      (up to 50 candidates)
  ├─► Pool 3: Locality match (file/symbol)  (up to 20 candidates)
  └─► Pool 4: Recent deposits               (up to 10 candidates)
         │
         ▼
    Merge & deduplicate
         │
         ▼
    Re-rank (rankScore formula)
         │
         ▼
    Format → context window

Poisoning enters at Pool 2 and Pool 3. Vector similarity finds semantically similar content regardless of whether the code it references still exists. Locality matching amplifies this: insights anchored to specific file paths score higher when the query provides matching paths, even if the file has changed completely since the insight was written.

Distraction enters at the formatting stage, before the context window. Skills are retrieved separately from insights, matched by semantic similarity against the query, and injected in full before the insight results. A permissive threshold means large skill blocks arrive regardless of operational relevance.

Clashing enters at the merge stage. The pipeline deduplicates by embedding similarity (threshold: 0.75) but has no cross-insight semantic analysis for contradiction. Two insights about the same error with opposite advice will both survive deduplication if their surface text differs enough.

One structural factor makes all three failure modes worse on the most common query pattern. The first query in every agent session is mandatory — our CLAUDE.md checklist requires it before the agent reads any files. This means the first query has no file_context, no symbol_context, and no tag_context. When no locality context is present, the scoring function redistributes the locality weight into the semantic weight:

// apps/web/src/lib/insight/store.ts, lines 362-364
const weights: readonly [number, number, number, number] = hasLocalityContext
  ? sc.weights                                                  // [0.46, 0.31, 0.15, 0.08]
  : [sc.weights[0] + sc.weights[1], 0.0, sc.weights[2], sc.weights[3]]; // [0.77, 0.00, 0.15, 0.08]

The full scoring formula is:

rank_score = 0.46 × semantic + 0.31 × locality + 0.15 × strength + 0.08 × trust

Without locality context, this collapses to:

rank_score = 0.77 × semantic + 0.00 × locality + 0.15 × strength + 0.08 × trust

Pure semantic retrieval at 77% weight. This is the correct behavior when locality context is unavailable — but it means the first query, which is also the query with the broadest scope, is also the query least capable of filtering by relevance. The QUERY FIRST requirement creates a structural worst case.

What We Found

Finding 1: Skills at 0.30 — too permissive

The original skill retrieval threshold was MIN_SEMANTIC_SIMILARITY = 0.30, shared with general insight retrieval. Skills are not insights. An insight is a few sentences. A skill is a full instruction block, often 2,000–3,000 characters, covering a specific workflow in detail. The token cost of a false positive is an order of magnitude higher.

At 0.30 similarity, skills were serving consistently for queries in loosely related domains. The dejavu-release skill — covering Kotlin library releases, Maven Central publishing, and GitHub Actions monitoring — appeared regularly in TypeScript agent sessions because software release workflows share semantic space with software development workflows. Similarity was real. Relevance was not.

Finding 2: SENSE_RETURN as a runaway feedback loop

Every time an insight is returned in a query result, Surchin fires a SENSE_RETURN signal: +1.0 strength. This is by design — retrieval is implicit validation. But the signal fires on every retrieval regardless of frequency, which creates a compounding effect for popular insights. An insight retrieved 50 times per day accumulates +50.0 strength per day from SENSE_RETURN alone, dwarfing the human upvote signal (+50.0 per upvote, but those are rare). High-retrieval insights drift toward the top of rankings not because they are more useful, but because they are already at the top.

The signal also has no session awareness at the strength level. The same insight retrieved 10 times in a single coding session generates +10.0 strength, as if 10 independent agents had validated it.

Finding 3: 4KB truncation cuts mid-insight

The formatted context block is truncated at FORMATTED_CONTEXT_BUDGET = 4000 characters (see constants.ts). The original implementation used a simple .slice(4000), which cuts the string at the byte boundary regardless of insight boundaries. An insight returned last in the ranking order could be injected with its content cut off mid-sentence. The model receives a partial insight with no indication that it is partial.

Finding 4: No contradiction detection

The pipeline had no mechanism to detect when two insights in the same result set made opposing claims about the same error or pattern. SOLUTION and PITFALL insights for the same error signature could both appear in a single response, each with high rank scores, each authoritative-looking, contradicting each other on the concrete recommendation.

The Fixes

Raise the skill threshold

We raised SKILL_MIN_SEMANTIC_SIMILARITY from 0.30 to 0.55. This is a separate constant from the general insight threshold — skills need a higher bar because their context cost is higher.

We also changed the default delivery format for skills. Instead of injecting full instruction blocks on match, query_insights now returns skill descriptions only (roughly 100–200 characters). Agents can fetch full instructions on demand via get_skill. This reduces the default context cost of a skill match from ~2,500 characters to ~150 characters, and means false positives for skill retrieval cost almost nothing.

Dampen SENSE_RETURN

The fix is three constraints applied together:

  1. Cooldown: A SENSE_RETURN signal fires at most once per insight per 4-hour window per session (SENSE_RETURN_COOLDOWN_HOURS = 4).
  2. Daily cap: At most 3 SENSE_RETURN signals per insight per day (SENSE_RETURN_DAILY_CAP = 3).
  3. Diminishing returns: Each SENSE_RETURN beyond the first in a day applies a 0.5× multiplier (SENSE_RETURN_DIMINISHING_FACTOR = 0.5).

This preserves the feedback loop for genuinely popular insights while breaking the runaway compounding for high-frequency retrieval.

Insight-boundary truncation

We replaced the .slice(4000) truncation with a boundary-aware version that finds the last complete insight before the budget and truncates there. The result may be slightly under 4,000 characters, but every insight in the response is complete. The last insight in the ranking is either fully included or fully excluded.

Validation-driven decay

We added a mechanism to distinguish between insights that are retrieved in contexts where they remain valid and insights that are retrieved in contexts where their code references no longer match. Insights that reference specific files or symbols now start at 2× the standard decay rate (CODE_REFERENCE_DECAY_MULTIPLIER = 2.0). When a SENSE_RETURN signal fires with file overlap — meaning the querying agent is working in the same files the insight references — the decay rate resets to 1×. Insights that are retrieved without file overlap accumulate staleness faster; insights that are confirmed in-context stabilize.

Code references are refreshed on a 30-day cycle (CODE_VALIDATION_REFRESH_DAYS = 30). An insight about a specific function that isn't retrieved in a matching file context for 30 days degrades at 2× the standard rate.

Contradiction detection

We added a post-merge step that checks for SOLUTION/PITFALL pairs sharing the same error signature. When two such insights are both in the candidate set, they are flagged with a [!] notice in the formatted output, and the lower-ranked one is demoted in the result ordering. This doesn't suppress contradictions — suppression would hide information that might be useful — but it surfaces them explicitly rather than presenting both as equally authoritative.

Lessons for MCP Server Authors

If you are building an MCP server that injects context into agent workflows, these findings generalize.

Context window is a shared budget — measure output in tokens. The natural unit for retrieval pipelines is characters or documents. The relevant unit for context quality is tokens. An insight might be 200 characters but 60 tokens. A skill might be 2,800 characters and 700 tokens. The 4,000-character budget in Surchin is approximately 1,000 tokens, or 5–10% of a typical Claude context window during active work. That sounds small. It is not small when it's injected on every query.

Relevance is not the same as similarity. Vector similarity finds semantically related content. Relevance requires locality (is this about the files I'm working in?), recency (is this still true?), and operational coherence (does this apply to what I'm doing right now?). A retrieval pipeline that optimizes only for semantic similarity will produce high-recall, low-precision results. Precision matters more — irrelevant content in a context window is worse than no content, because it competes with the relevant content the model already has.

Default queries are your worst case. The most common query pattern — "here's a vague description of what I'm starting to work on, no files yet" — is structurally the hardest query to retrieve well for. Design your thresholds and weights for this case, not for the ideal case where the agent provides rich file and symbol context.

Measure distraction, not just recall. Standard retrieval metrics (precision@k, NDCG) measure whether the right content is retrieved. They don't measure whether wrong content is also retrieved. Build separate metrics for distraction: what fraction of retrieved content is not relevant to the query? A system with 90% recall and 40% distraction is worse than a system with 80% recall and 10% distraction for most agent use cases.

Skills are highest-risk injections. Any content that arrives with instructions — "do this," "follow this workflow" — carries higher behavioral weight than factual content. A wrong fact can be overridden by other evidence. A wrong instruction, delivered in authoritative MCP format, tends to be followed. Threshold skills more aggressively than factual content, and prefer lazy-loading patterns where full instructions are only fetched when explicitly requested.


Methodology: findings are based on an internal audit of Surchin's retrieval pipeline conducted in March 2026. Code references are from packages/shared/src/constants.ts, packages/shared/src/scoring.ts, and apps/web/src/lib/insight/store.ts at the time of publication. Constants cited: DECAY_BY_STATUS.promoted = 0.0005, DEFAULT_WEIGHTS = [0.46, 0.31, 0.15, 0.08], SKILL_MIN_SEMANTIC_SIMILARITY = 0.55 (raised from 0.30), SENSE_RETURN_COOLDOWN_HOURS = 4, SENSE_RETURN_DAILY_CAP = 3, SENSE_RETURN_DIMINISHING_FACTOR = 0.5, CODE_REFERENCE_DECAY_MULTIPLIER = 2.0, FORMATTED_CONTEXT_BUDGET = 4000.