Started Llama 3.1 8B Locally

Michael Ruminer
2 min readJul 31, 2024

--

AI generated black and white photo style image of a llama staring at you standing close to the lens and in a field. Shows only the head and part of the neck.

I started a load of Llama 3.1 8B onto my local machine. It’s not the beefiest model but is the model that I could support on my aging machine and GPU. I have not yet had time to play with it but this is step one of a multi step experiment.

It took me less than 30 minutes to get up and running and that is an estimate on the far end. The bulk of that time was deciding if a specific tool was supported on my machine. I needed to make sure my processor had AVX2 support. Once I got over that hurdle it was easy peasy. I decided not to start with Ollama simply because LM Studio intrigued me, and I liked the interface for adjusting parameters right in the interface plus the more pleasing interaction experience than just the command line. LM Studio allowed me to install in minutes, select a chat interface, or server, and then provided the interface which allowed me to select which of the models I have installed and I was off and running for a chat session to begin with. The response time was good and returned output at about as fast as I could read it.

Why the 8B model? Simply because I don’t have the machine specs to run the 70B. You really need multiple terabytes of SSD storage, a much better GPU, and twice as much RAM at a minimum. The 8B is more than sufficient for me to start testing agent creation and execution.

My Machine Specs For the 8B

As an example of the machine specs I was able to get the 8B running under with very reasonable performance.

  • i7–107000k processor at 3.8 GHz
  • 16 GB RAM
  • Nvidia GTX 1060 with 6GB of RAM

I have a MacBook as well but it is a 2019 MacBook with an Intel processor. Llama 3.1 only supports M1+ processors.

70B Machine Specs

According to the 8B model, when asked, the 70B model needs:

  • Processor: i7 or higher with 8 cores and 16 threads.
    - So not much required in the CPU department.
  • GPU: A high-end NVIDIA GeForce or Quadro GPU with at least 24 GB of VRAM (e.g., GeForce RTX 3080 Ti or Quadro RTX 8000).
  • Memory: At least 64 GB of RAM, preferably 128 GB or more.
    – This will ensure that you have sufficient memory to handle the model’s massive parameter count (70 billion parameters) and other system resources.
    - Not sure I’ll ever have machine with 128 GB of RAM
  • A high-speed storage drive with a large capacity (at least 1 TB)

--

--

Michael Ruminer

My most recent posts are on AI especially from the perspective of a non-AI tech worker. did:web:manicprogrammer.github.io