---
id: general/llm-context-engineering
name: LLM Context Engineering
version: 1.0.0
ecosystem: multi
type: guide
time_sensitivity: evergreen
source: verified
confidence: medium
maintainer: AgentRel Community
last_updated: 2026-03-19
---

## Overview

Context engineering is the practice of carefully constructing the input to a language model to maximize output quality. As context windows grow (Claude: 200K tokens, GPT-4o: 128K), effectively managing what goes in — and what doesn't — is a core skill for building reliable AI applications.

## Context Window Fundamentals

```
Total context = system prompt + conversation history + retrieved docs + tools/schema + output
```

**Token budget example (200K window):**
- System prompt: ~1K tokens
- Tools/function schemas: ~2-5K tokens
- Retrieved context (RAG): ~20-50K tokens
- Conversation history: ~10-30K tokens
- Reserved for output: ~4-8K tokens

## System Prompt Best Practices

```markdown
You are [specific role]. Your job is to [specific task].

## Constraints
- Always [hard rule 1]
- Never [hard rule 2]
- When uncertain, [fallback behavior]

## Output Format
Respond in [format]. Example:
[concrete example of desired output]

## Context
[Background information that doesn't change]
```

**Key principles:**
1. **Be specific, not vague** — "Answer in 2-3 sentences" not "Be concise"
2. **Use examples** — Show don't tell; one good example > 100 words of description
3. **Separate instructions from data** — Use XML tags or headers to delimit sections
4. **Put critical instructions last** — Recency bias means later instructions are followed more reliably

## RAG vs Fine-Tuning Decision Matrix

| Scenario | Recommendation |
|----------|----------------|
| Dynamic / frequently updated data | RAG |
| Proprietary knowledge base | RAG |
| Style/tone/format changes | Fine-tuning |
| Domain-specific reasoning patterns | Fine-tuning |
| Both knowledge + behavior changes | RAG + Fine-tuning |
| Cost-sensitive production | Fine-tuning (smaller model) |

## RAG Implementation Pattern

```typescript
// 1. Chunk documents (overlap to preserve context)
const chunks = splitText(document, { chunkSize: 512, overlap: 50 })

// 2. Embed and store
const embeddings = await embed(chunks)
await vectorDB.upsert(embeddings)

// 3. Retrieve at query time
const query = userMessage
const relevant = await vectorDB.query(query, { topK: 5, minScore: 0.7 })

// 4. Construct context
const context = relevant.map(r => r.text).join('\n\n---\n\n')

// 5. Build prompt with retrieved context
const prompt = `
<context>
${context}
</context>

User question: ${query}

Answer based only on the context above. If not found, say so.
`
```

## Context Compression Techniques

**1. Summarization** — Compress old conversation turns:
```typescript
if (tokenCount(history) > 50_000) {
  const summary = await llm.summarize(history.slice(0, -10))
  history = [{ role: 'system', content: `Previous context: ${summary}` }, ...history.slice(-10)]
}
```

**2. Selective retrieval** — Only include relevant chunks, not entire documents

**3. Structured extraction** — Pre-extract key facts into structured format before adding to context

## Prompt Injection Defense

```typescript
// Never interpolate untrusted user input directly into system prompts
// ❌ Bad
const systemPrompt = `You are a helpful assistant. User info: ${userInput}`

// ✅ Good — separate system from user data
const messages = [
  { role: 'system', content: 'You are a helpful assistant.' },
  { role: 'user', content: userInput },  // untrusted content in user turn
]
```

## Measuring Context Quality

- **Faithfulness**: Does the output match the provided context?
- **Answer relevance**: Does the output address the actual question?
- **Context recall**: Were the relevant chunks actually retrieved?
- Use [RAGAS](https://github.com/explodinggradients/ragas) for automated RAG evaluation

## Reference
- [Anthropic Prompt Engineering Guide](https://docs.anthropic.com/claude/docs/prompt-engineering)
- [OpenAI Prompt Engineering](https://platform.openai.com/docs/guides/prompt-engineering)
- [RAGAS evaluation framework](https://github.com/explodinggradients/ragas)