Having AI Tools Generate Code For AI Tools

Michael Ruminer
4 min readSep 6, 2024

--

Black and white AI generated image of a person (male) wearing glasses, probably a software programmer, wearing a hoodie with the hood up and a phone to their ear. An open laptop sits in front of them that they are working at. The perspective is with the image facing the programmer slight skewed to one side. A blurry blackboard is in the distance with unreadable text.

I recently posted on my experience with using a few Python examples of retrieval augmented generation (RAG) with an LLM (OpenAI API). They were underwhelming, though they did provide a lot of insight for me. This post is about my next experiment: let’s get AI tools to generate examples for me. The results were mixed but not for reasons you might think. Here’s what I found using ChatGPT, Microsoft Copilot and Cursor.

Note: you can find the generated code in my LLM_Generated_RAG Github repo. I will likely add to this repo as I try new prompts, LLMs, and approaches.

ChatGPT 4o

Let’s start with the most well-known, ChatGPT. I used a simple prompt to see what it would do. I didn’t provide prompts for different parts of the code. It should not be very extensive code so I didn’t feel the need to divide it into separate prompts. If you want to see my very elementary prompt pull up the ChatGPT related file in the repo.

The first thing I noticed is that it was using PyPDF2 which was deprecated in Decemeber of 2022. Not surprising as, almost certainly, a lot of what the model was trained on used PyPDF2. It actually created well-structured code with functions for the different discrete operations. What I realized later, due to the output of the Cursor tool, is that it created a more granular set of code than it had to. This is exactly what I was looking for but didn’t specify in the prompt. What do I mean? It did the following:

  • read the PDF and extracted the text
  • split the document into chunks
  • created embeddings for the chunks using the OpenAI text-embedding-ada-002
  • created an embedding for the query
  • searched the chunks using faiss
  • generated a response using the query and the chunk(s) as context

This is what I wanted. Turns out there was a very different way to interpret the prompt.

The downside. The code didn’t run out the box. Also, I believe it only used one chunk for the context to send along with the query. I have to investigate the code more closely to be sure on that single chunk thing. The embedding search chunk function I need to investigate more to understand it. I didn’t try to make the code run for now as that was not part of the experiment. I expect I will try to modify it soon to function. I’ll report back the results.

This granular set of actions was very different than what I got from Cursor.

Cursor

The Cursor prompt was not exactly the same as what I used for ChatGPT, which was bad of me to do. I got a little lazier, but the crux was the same. You can see that prompt here.

The results were very different. It did not provide the granular steps that ChatGPT did. It met the requirement more succinctly.

  • extract the text from the PDF, also using PyPDF2
  • pass the entire text to ChatGPT as context along with the prompt

When I first posted this I said it rain out of the gate, but that was wrong. It suffered from the same issue all three code generations did. It tried to use an old call of openai chat completion. With that fixed I believe it would, hypothetically, return the expected results. It would work similarly to how one does it in the ChatGPT interface. Not what I wanted, but I hadn’t specified to break it down so that I was creating embeddings etc. Fair game.

The downside, the context was too large (43,000+ tokens) for the tokens per minute limit of my OpenAPI account level and gpt-4o model (see attached note for an explanation). So I didn’t get to see the output, but have no doubt it would have produced similar results as if I had done it through the ChatGPT user interface.

Microsoft Copilot

What can I say? I don’t know what I would have gotten here because Copilot blocked the output. This is apparently because “GitHub Copilot checks code completion suggestions with their surrounding code of about 150 characters against public code on GitHub. If there is a match, or a near match, the suggestion is not shown to you.”

screenshot of what Copilot returned: “Sorry, the repsonse matched public code so it was blocked. Please rephrase your prompt.” and a hyperlink to “Learn more”

There is a flag you can set in your Github settings to turn this on or off. I checked and my flag and it is set to “Allowed” but you see the results.

screenshot of a setting in Github that says “Suggestions matching public code (duplication detection filter)” and the value set to “Allowed”

I’ll continue to try to troubleshoot this issue and see if I can get something out of Copilot and update this post if I do.

In recap

In recap. One provided the detail I wanted even though I hadn’t specified it, a happy coincidence, but it did not run out of the gate. The second took a very different approach and would have provided the desired results in LLM response if my OpenAPI account level had supported a large enough tokens per minute for 43,000+ tokens in the context. But it wasn’t the code I was hoping for. Copliot just didn’t work for me for reasons I don’t yet understand. More experimentation to come.

--

--

Michael Ruminer

My most recent posts are on AI, especially from the perspective of a, currently, non-AI tech worker. did:web:manicprogrammer.github.io