AI Context Toolbox: Picking the Right Tool for the Job

So you’ve got an AI model: maybe it’s Claude, maybe it’s Titan, and now you want to feed it your company’s Confluence space, including that onboarding doc that hasn’t been updated since 2019.

Here’s where many people get it wrong.

One of the biggest misconceptions I hear is that you can just “upload” a large amount of unstructured data to a model, like an entire wiki site, and boom… it just knows things now. But that’s not quite how it works. In reality, there are several distinct methods for providing context or additional data to models, and each comes with its own trade-offs and limitations.

Let’s walk through the main approaches, clear up some common confusion, and figure out which method actually works for each use case.

1. RAG: Retrieval-Augmented Generation

TL;DR: You fetch the data and append it to the prompt at runtime.

How it works:

You chunk and store your data in a vector database (like Pinecone or Qdrant).
When a user asks a question, you run a similarity search to find the most relevant chunks.
Those chunks are added to the prompt before it’s sent to the model.

What exactly does a vector represent?

It’s a fixed-length list of numbers that captures the meaning of a piece of text. Sentences with similar meanings are turned into vectors that end up close to each other, so the system can find related ideas even if they don’t use the same keywords.

For example:

“Reset my password” and “Forgot my login credentials” will have similar vectors because they mean roughly the same thing.
“What’s the weather?” would have a very different vector because it’s about a completely different topic.

Common Fields in Vector Search Results:

Field	Description
`id`	Unique identifier of the stored data
`score`	Similarity score (higher = more similar)
`payload`	The main retrieved content (stored in custom fields like title, source, text, etc.)

Example search result:

{ 
    "id": "abc123", 
    "score": 0.91, 
    "payload": { "text": "...", "source": "doc_name" } 
}

You would then collect the results and extract the text and optionally other fields from the payload and insert them into the prompt.

Great for: Retrieving answers without retraining
Not so great for: Long-term memory or consistent behavior changes

2. Fine-Tuning

TL;DR: You can teach AI new tricks, but it’ll cost you time, money, and a few mental breakdowns.

How it works:

You prepare a dataset with many examples like: “When I ask X, respond with Y.”
The model is re-trained (or fine-tuned) on this dataset to learn new behavior patterns.

You need clean, curated examples like “question = answer,” “instruction = completion,” or “input = formatted output.”

Example data:

Customer support questions with ideal responses
Product descriptions rewritten in your brand’s tone
Code snippets with corresponding reviews or bug fixes

This isn’t about cramming in your entire SharePoint directory into the model, it’s about changing the model’s behavior. Also based on the model and the service hosting it, there may be quotas and additional costs associated with fine-tuning.

Great for: Custom behaviors, domain-specific tone or formatting
Not so great for: Real-time updates or frequently changing data

3. Tool Use

TL;DR: The model asks you for help before hallucinating. You respond with data. Everyone wins.

How it works:

You define function declarations that are mapped to tools available for the model to use.
The model invokes a tool with a function call using a structured message in the response.
You execute the function code and return the result back to the model.

In this scenario, the AI model isn’t the one actually retrieving the data. Instead, it uses a list of function declarations you provide to the model.

When it receives a request that requires information from an external tool, it requests that information using a function call. It does this by generating a structured message as a response. This includes the name of the function along with the required parameters based on the function declarations provided earlier.

You handle the function call by extracting the function name and parameters from the model’s response. Then you execute the action which can be a REST API call to another service, internal DB query, or document search.

MCP (Model Context Protocol) is an open standard that can integrate models with these tools using a client/server architecture. However, it is not required. You may already have mechanisms in place to interact with other tools.

MCP can be useful in cases where your model interacts with many external tools, because you can avoid building your own implementation to integrate with each one. Instead of making an API call, your MCP client sends the structured response directly to the MCP server.

I won’t go over MCP’s implementation details here, but there’s a great explanation in this video .

Great for: Real-time data, decision trees, complex tasks
Not so great for: Simple tasks or low-latency responses

4. Prompt Augmentation

TL;DR: The “classic” method. Manually prepend relevant info to the user’s prompt.

This could be a set of instructions, rules, chat history, whatever the model might need to do its job. You’re not changing the model; just giving it more context on the fly.

Great for: Fast iteration, quick instructions, static context
Not so great for: Large number of documents or maintaining long-term memory

Bonus Round: Orchestration / Agents

TL;DR: You run a multi-step flow using multiple tools and models.

Maybe your app needs to fetch data from multiple sources before making a decision. You can decide to either orchestrate this yourself, or hand over control to an AI agent that decides next steps and when to call tools for help.

Great for: Workflows, pipelines, decision logic
Not so great for: Predictable outcomes or tightly controlled behavior

Quick Comparison

Method	Runtime Use?	Persistent Learning?	Frequently Updated Data?	Good For…
RAG	✅	❌	✅	Docs, knowledge bases
Fine-Tuning	❌	✅	❌	Domain tasks, behavior adaptation
Tool Use	✅	❌	✅	API calls, external data
Prompt Augmentation	✅	❌	⚠️ Can be fragile	Quick rules, short-term context

Some things to think about when choosing an approach:

Where does the data live?
How recent does it need to be?
Do we need to fetch it, or can we provide it when making the request?

When one prompt closes another context window opens

Newer models now support massive context windows, with some exceeding a million tokens, making it technically possible to include entire document repositories in a single prompt. However, dumping your company’s full collection of Quip docs into the input may not deliver the results you want.

While the larger context allows for broader recall, it doesn’t remove the need for relevance, structure, and signal-to-noise optimization. Building AI systems that return accurate and useful responses still requires choosing the right retrieval strategies and providing information in a way that improves the model’s response.

If I am completely off-base about any of these, please let me know . I’m still pre-training my mental model about how all this works and feedback is part of that training data.