The journey to Llama running on ancient-though-local hardware enjoys some twists and turns; after securing the second hand machine, Exo Labs had to contend with finding compatible PS/2 peripherals ...
Eventually, they managed to sustain a performance of 39.31 tokens per second running a Llama-based LLM with 260,000 parameters. Cranking up the model size significantly reduced the performance ...
A hot potato: The open-source project llama2.c is designed to run a lightweight version of the Llama 2 model entirely in C code. This "baby" Llama 2 model is inspired by llama.cpp, a project ...
Exo Labs, an outfit with the mission statement of democratising access to AI, such as large language models, has lifted the lid on its latest project: a modified version of Meta's Llama 2 running ...