My Failed (so far) AI RAG Code Experimentation

Michael Ruminer
3 min readSep 3, 2024

--

A yellow document over a network of nodes and lines with a dark blue background. Might make one think of a document being added to an LLM

I have been wanting to experiment with creating some code performing retrieval augmented generation (RAG) for an LLM. So far I am 0 for 2. In both examples I tried, that were professed to work, it fell far short. I blame embedding and retrieval. I also blame myself… who else is there to blame. I know RAG works in some form because I can go onto ChatGPT upload a PDF and prompt about it with expected and quality results; but, when trying to replicate similar outcomes from code and the OpenAI API I get disappointing results. Here’s the story.

Experiment one was part of a Udemy course on RAG and mastering some better approaches than naive RAG. Specifically RAG with query expansion. It was a very short course but had a couple of cohesive Python examples that built out over the course material. It included the document to use for the augmentation. Additionally, unlike experiment two, it largely didn’t rely on a ton of abstraction of the underlying basic concepts. After reading and parsing the text from the PDF I used RecursiveCharacterTextSplitter and
SentenceTransformersTokenTextSplitter from Langchain to create my chunks. Added my chunks to Chroma db in memory using their default embedder. Took my query and performed the retrieval of 5 chunks from the Chroma db. No reranking performed. The returned chunks were at best partial results and at worst just not as good as expected. What I hadn’t noticed about the code from the instruction when I went through the class is that it never passed the chunks back as context to the LLM the second time along with the query to get an answer. Kind of an important part to be missing. I can tell from the returned chunks it would not have produced a satisfactory answer had the closing action before performed. I tried with differing chunk sizes and overlaps and never received better results. I tried with my own document and faired no better. I chalked it up to a disappointing and poor example. Perhaps this is why it didn’t go the final step to pass it all back to the LLM for a response.

I moved on to a second experiment that used a bit more abstraction by relying on Langchain significantly more. It was also doing naive RAG, not augmenting the prompt from the LLM initially in any way. This time it did have the pass to the LLM in the end to get the LLM response. Looking at the chunks it retrieved I could tell I wasn’t going to get a satisfactory response. I had asked it for a list of 10 principles that were specifically outlined in the document in a single paragraph. The best in all my attempts was that I got 2, maybe 3 of the list. Again I played around with chunk size and overlap and generally only got worse results. The results in the demonstration had not seemed much better when I evaluated it more closely.

All in all I need a third+ experiment. I am trying to get a response even remotely similar to what I get from the ChatGPT. Obviously, my examples are not high enough quality.

Do you have such an example?

Time to ask ChatGPT, Claude, Github Copliot, or Cursor — sounds like a job for Cursor — to create a python program for me and see how it functions.

--

--

Michael Ruminer

My most recent posts are on AI especially from the perspective of a non-AI tech worker. did:web:manicprogrammer.github.io