Using LM Studio and CrewAI with Llama

Michael Ruminer
3 min read1 day ago

--

Black and white AI generated image of a llama with fields and hills as a backdrop

I was recently experimenting with CrewAI, but in one experiment I wanted to use a local LLM hosted by LM Studio. Why LMStudio? Well… because that was the experiment. I wanted to see how I’d use LMStudio to host my model versus Ollama and then use it from CrewAI. Below is a very simple setup for doing this.

It’s good to point out that CrewAI uses LiteLLM under the hood as a sort of proxy. This provides CrewAI with the ability to talk to a range of LLMs without needing to really do much in terms of handling it.

What Doesn’t Work and Why

The first thing I found is what doesn’t work. The LLM class in CrewAI allows for the instantiation of an LLM that can then be used by agents. Spinning up an LLM instance on a local Ollama hosted model can look like below.

 ollama_31_8b = LLM(
model="ollama/llama3.1",
base_url="http://localhost:11434"
)

This works just fine if hosting the LLM inside Ollama, but you get response exceptions if, instead, you try to host inside LMStudio at the same server port.

ERROR: LiteLLM call failed: litellm.APIConnectionError: 'response'

First, you notice that the base_url doesn’t have a “/v1” at the end which LMStudio uses in their server setup. If you fix that, thinking it might work, you’ll find that you likely get the same error.

Secondly, you may realize that the model property in your LLM instantiation uses a [PROVIDER]/[MODEL] format. I tried removing the provider portion to see what would happen. The results were:

llm.py-llm:161 - ERROR: LiteLLM call failed: litellm.BadRequestError: LLM Provider NOT provided.

That’s a reasonable outcome.

What Does Work

Lastly, I remembered that LM Studio is using OpenAI endpoints.

A screenshot that shows part of the LMStudio UI where it states “Start a local HTTP server that mimics select OpenAI API endpoints.”

A quick look at the LiteLLM docs provided the answer I needed; set the provider as “openai”. This results in a final outcome of:

ollama_31_8b = LLM(model="openai/llama3.1", base_url="http://localhost:11434/v1")

Now, if you try running it with the agent using the LLM instantiated it will work. Below is example code of the LLM and agent creation where I had Llama 3.1:8b model hosted in LM Studio on port 11434.

@CrewBase
class MyCrew():

llama_31_8b = LLM(
model="openai/llama3.1",
base_url="http://localhost:11434/v1"
)

@agent
def joke_generator(self) -> Agent:
return Agent(
config=self.agents_config['joke_generator'],
verbose=True,
llm=self.llama_31_8b
)

Note

Note that on LMStudio I had my server port set to 11434 versus the default of 1234. It made it easier as I switched back and forth between Ollama and LM Studio; I didn’t need to modify the port. 11434 is the default Ollama port.

Screenshot from LMStudio showing the setting for the port to be 11434

When Might I Use This

When might I use this? If I am programming, probably rarely. I could instead host the model in Ollama. I’d use LM Studio if I want to host a model and chat with it. In that scenario, I’d probably be more likely to use Ollama with AnythingLLM which would also provide me with some Retrieval-Augmented Generation (RAG) capabilities. Nonetheless, it was an experiment and I proved, for myself, it could easily be done.

--

--

Michael Ruminer

My most recent posts are on AI, especially from the perspective of a, currently, non-AI tech worker. did:web:manicprogrammer.github.io