--- id: general/llm-context-engineering name: LLM Context Engineering version: 1.0.0 ecosystem: multi type: guide time_sensitivity: evergreen source: verified confidence: medium maintainer: AgentRel Community last_updated: 2026-03-19 --- ## Overview Context engineering is the practice of carefully constructing the input to a language model to maximize output quality. As context windows grow (Claude: 200K tokens, GPT-4o: 128K), effectively managing what goes in — and what doesn't — is a core skill for building reliable AI applications. ## Context Window Fundamentals ``` Total context = system prompt + conversation history + retrieved docs + tools/schema + output ``` **Token budget example (200K window):** - System prompt: ~1K tokens - Tools/function schemas: ~2-5K tokens - Retrieved context (RAG): ~20-50K tokens - Conversation history: ~10-30K tokens - Reserved for output: ~4-8K tokens ## System Prompt Best Practices ```markdown You are [specific role]. Your job is to [specific task]. ## Constraints - Always [hard rule 1] - Never [hard rule 2] - When uncertain, [fallback behavior] ## Output Format Respond in [format]. Example: [concrete example of desired output] ## Context [Background information that doesn't change] ``` **Key principles:** 1. **Be specific, not vague** — "Answer in 2-3 sentences" not "Be concise" 2. **Use examples** — Show don't tell; one good example > 100 words of description 3. **Separate instructions from data** — Use XML tags or headers to delimit sections 4. **Put critical instructions last** — Recency bias means later instructions are followed more reliably ## RAG vs Fine-Tuning Decision Matrix | Scenario | Recommendation | |----------|----------------| | Dynamic / frequently updated data | RAG | | Proprietary knowledge base | RAG | | Style/tone/format changes | Fine-tuning | | Domain-specific reasoning patterns | Fine-tuning | | Both knowledge + behavior changes | RAG + Fine-tuning | | Cost-sensitive production | Fine-tuning (smaller model) | ## RAG Implementation Pattern ```typescript // 1. Chunk documents (overlap to preserve context) const chunks = splitText(document, { chunkSize: 512, overlap: 50 }) // 2. Embed and store const embeddings = await embed(chunks) await vectorDB.upsert(embeddings) // 3. Retrieve at query time const query = userMessage const relevant = await vectorDB.query(query, { topK: 5, minScore: 0.7 }) // 4. Construct context const context = relevant.map(r => r.text).join('\n\n---\n\n') // 5. Build prompt with retrieved context const prompt = ` ${context} User question: ${query} Answer based only on the context above. If not found, say so. ` ``` ## Context Compression Techniques **1. Summarization** — Compress old conversation turns: ```typescript if (tokenCount(history) > 50_000) { const summary = await llm.summarize(history.slice(0, -10)) history = [{ role: 'system', content: `Previous context: ${summary}` }, ...history.slice(-10)] } ``` **2. Selective retrieval** — Only include relevant chunks, not entire documents **3. Structured extraction** — Pre-extract key facts into structured format before adding to context ## Prompt Injection Defense ```typescript // Never interpolate untrusted user input directly into system prompts // ❌ Bad const systemPrompt = `You are a helpful assistant. User info: ${userInput}` // ✅ Good — separate system from user data const messages = [ { role: 'system', content: 'You are a helpful assistant.' }, { role: 'user', content: userInput }, // untrusted content in user turn ] ``` ## Measuring Context Quality - **Faithfulness**: Does the output match the provided context? - **Answer relevance**: Does the output address the actual question? - **Context recall**: Were the relevant chunks actually retrieved? - Use [RAGAS](https://github.com/explodinggradients/ragas) for automated RAG evaluation ## Reference - [Anthropic Prompt Engineering Guide](https://docs.anthropic.com/claude/docs/prompt-engineering) - [OpenAI Prompt Engineering](https://platform.openai.com/docs/guides/prompt-engineering) - [RAGAS evaluation framework](https://github.com/explodinggradients/ragas)