Intro
Retrieval-Augmented Generation (RAG) is a powerful technique to supply data to a Large Language Model (LLM) and generate accurate responses based on your own content. It’s cost-effective to run and you don’t need to train your own LLM. Great.
In my previous blog post, I experimented with a locally running Llama 3.2 model to answer questions based on my content.
While it performed adequately, I wasn’t completely satisfied with the results. To improve the output, I optimized input for the context and also the prompt. The results were surprising.
This is post is part of a series of posts on LLMs and RAGs. Check out the other articles as well:
The Importance of Good Context and a Good Prompt
I found two solutions that massively improved the quality of the answers that my RAG augmented Llama model provided.
Solution 1: Relevant Context, Relevant Context, Relevant Context
The similarity search in pgai successfully retrieved documents that were relevant to the question. But it also often returned documents that weren’t very relevant to the question.
This made sense, as few articles on my blog specifically addressed a question like QA.
To resolve this, I limited the search to return only one document (via SQL’s LIMIT 1), ensuring more relevant context. LIMIT 1 might be too strict of course. But it really solved the problems I had for the use cases I tested. Nice!
Learning: Make sure to provide relevant content to the LLM.
Solution 2: A Better Prompt
Initially, my prompt was:
What does good QA look like?
Context: {relevant documents from my blog}
This resulted in overly generic answers, with the model summarizing the content rather than directly addressing the question. I refined the prompt to:
DOCUMENT:
{context}
QUESTION:
{query}
INSTRUCTIONS:
Answer the user's QUESTION using the DOCUMENT text above.
Keep your answer grounded in the facts of the DOCUMENT.
If the DOCUMENT doesn’t contain the facts to answer the QUESTION then please say so.
This significantly improved the quality of responses of the Llama32 model. An example answer by my locally running Llama32 model looked like:
Based on the document, here's an answer to the question:
Can you tell me how modern QA should look like?
According to the document, a modern QA approach should be the following:
* Go without a separate QA team by default
* Hire engineers that write tests on the proper level and own the quality aspect
* Use metrics, quality KPIs, and post-mortems to oversee the quality of software in the department
* Have support teams when large features go live and exploratory testing
* Automation for few critical paths
In other words, a modern QA approach focuses on ensuring that engineers write
high-quality tests and own the quality aspect, rather than relying solely on a
separate QA team. It also emphasizes the importance of metrics, post-mortems,
and automation to improve software quality.
I think this answer is already pretty cool. It’s based mostly on an article I wrote some time ago on QA. With a little bit of fine-tuning I can see this becoming a chatbot on my homepage. The first step to automate myself away.
Key Takeaways
The contextual content you provide and the quality of the prompt are crucial for obtaining accurate answers.
- Provide only content that relates closely to the question to the LLM. Sometimes, less is more, as the model may struggle to disregard irrelevant information.
- Craft a good prompt and also instruct the model to not return hallucinated or generic information.
More
- The prompt I used was recommended by Steve over here: Prompt Engineering for RAG.
- You might not like Meta. But the work they are doing for the Open Source community and us as human beings in the field of AI is amazing: https://ai.meta.com/blog/meta-llama-quantized-lightweight-models/ Time to buy some Meta stock I guess.