Pop search terms in here.

Feed the Birds

Ollama on Mac Pro (2012) with Bazzite

Once the AI label has been peeled off, I've found LLMs to be a remarkable tool to assist with programming. I'm a fly-by-night "coder" at the best of times, hopping between languages and platforms, forgetting the syntax here, repeating myself there. I do what I can, but if you asked me how to properly structure an if statement in PHP off the top of my head right now, I'd probably get it wrong.

Recently I've been developing more drivers for Hubitat and found Google's Gemini, just the freebie '2.5 Flash' model, to be a great help. It's particularly useful when I know there's a better way to deal with something but can't remember how, or if I hit a novel problem and want some suggestions. It's been particularly great at decoding Zigbee messages and answering my "but why?" questions when I don't get the reason behind some hex value being the way it is. It also shows its references, so I can jump out to the website it digested and read the information in context, which is great.

Local Models

This is all fun and games, but I wanted to know if I could run a model locally. I've done it with Ollama on my little Mac mini before, and the performance is... okay. The big hitch is my mini-experiment (I might mention that one day) has only 16 GB of shared memory, so I can't throw a decently-sized model on there and keep working normally. It needs to go on something else.

Next to me is my Mac Pro. At the time of writing it's about thirteen years old. As you may be able to tell from older posts on here, I kept it running a long time after Apple dropped support for it, with Monterey being the last macOS version I used regularly. It's now (don't laugh) my gaming PC running Bazzite, because it got a lovely Radeon VII graphics card back in 2020, and until very recently my games backlog went all the way back to Half Life (the first one). It ran that just fine. ;)

Trick is, the Intel Xeon X5690 CPUs I have in the Mac Pro don't support any form of AVX extension, and though it's not required and you can build Ollama yourself without it, that turned out to be too much when I first visited this issue with Ubuntu back in early 2023. I got it compiled and it would run on CPU, but it would never pass the checks and launch with GPU support. Seeing as hosting the GPU would be the system's main job, it was a bit of a failure.

Running On Bazzite

What with Bazzite being an immutable operating system, and therefore not really a playground for modding GPU drivers, I thought I might be out of luck. However, its strength is that everything should work out-of-the-box, making that low-level tinkering unnecessary. Fingers crossed.

When I discovered Ollama had switched from compile-time to run-time checks for AVX extensions (quite some time ago) it seemed that there was some hope. After a little digging I discovered how to make it work. Buckle in.

First, allow containers to access devices, otherwise that GPU isn't going to do much.

sudo setsebool -P container_use_devices=true

Then use podman to pull in the ROCm version of Ollama.

podman run -it --name ollama --replace -p 0.0.0.0:11434:11434 -v ollama:/root/.ollama --device /dev/kfd --device /dev/dri   docker.io/ollama/ollama:rocm

You're all done. It's that easy now.

Test it with the command line interface.

podman exec -it ollama ollama run gemma3:4b --verbose

I'm very impressed.

Tidy Up

The Ollama server itself really doesn't use anything in the way of resources when it's just sat idle in the background, and I wanted a literal push-button LLM experience. The way to do this on Bazzite is with Quadlet, which integrates containers into systemd.

My ollama.container file lives in ~/.config/containers/systemd/ and looks like this:

[Container]
ContainerName=ollama
Image=docker.io/ollama/ollama:rocm
AutoUpdate=registry
PublishPort=11434:11434
Volume=ollama:/root/.ollama
AddDevice=/dev/kfd
AddDevice=/dev/dri

[Unit]
Description=Ollama with ROCm

[Install]
WantedBy=default.target

Get things set up with:

systemctl --user daemon-reload
systemctl --user start ollama

Check the status with the usual suspects:

systemctl --user status ollama
journalctl --user -fu ollama

Then get the service to start before you log in to the desktop with:

loginctl enable-linger $USER

Any service with that [Install] section will now fire up in the background behind the login window as your user.

When I want to use an LLM, I push the power button on the Mac Pro and the Ollama server will fire up as soon as the system is ready. The action of the power button is captured by Bazzite, so I configured the system settings to perform a clean shutdown when it's tapped.

This all behaves itself while running in a completely headless configuration, so when I'm working on my Mac mini, with all peripherals disconnected from the Mac Pro, I just press the power button. When I'm done for the day and fancy acting like a goose, I flip my screen and peripherals over to the Mac Pro and log straight in.

For that real "native feel" a quick entry in your .bash_aliases file or equivalent...

alias ollama="podman exec -it ollama ollama"

...makes everything seem right at home.

Clients

I'm still experimenting with clients, and haven't quite settled on anything. I'm mostly using Open WebUI because that's what I bumped into first. I run it on my Mac with the assistance of OrbStack, set up with the following command:

docker run -d --name open-webui -p 3000:8080 -v open-webui:/app/backend/data -e OLLAMA_BASE_URL=http://192.168.1.1:11434 ghcr.io/open-webui/open-webui:main

The IP address there is for your Ollama server, of course. After a few moments you'll be able to fire up Open WebUI on http://127.0.0.1:3000 and do your thing.

I've also noticed Chatbox AI, Enchanted, Ollamac, and the curiously similarly-titled Ollamac Pro. There are bound to be more, and if you have a favourite I'd be interested to hear about it.

Results

Maybe you think thirteen years is too old of a machine to be useful for this task, and before access to the GPU and AMD's ROCm was working I'd have completely agreed. It was awful. But the same goes for any modern machine relying on CPU for LLMs.

The saviour of the system is the Radeon VII, which is a bit of a crazy card with its 16 GB of HBM2 memory, running at 1.02 TB/s. That's a touch faster than an NVIDIA RTX 4090, though that card does have an extra 8 GB of VRAM. It's essentially a memory-halved and clocked-down Radeon Pro Vega II, which is why it's called the "seven"; a pun in roman numerals that I rather appreciate. In fact, for a very particular Mac Pro reason, mine is flashed with the vBIOS of its sister card, the Radeon VII Pro. That was a tense chip-flashing, I can tell you.

gemma3

At the moment I'm using Google's gemma3 model most often, in the 12B or 4B parameter size. I get around 32 tokens/s on the 12B model, and 60 tokens/s with the 4B model, as displayed using the --verbose flag on the command line interface. Frankly, that's amazing performance, and I'm very glad I persevered with this. Most programming tasks are easily handled with the 4B model, and even the image recognition works a treat.

The 12B model uses about 60% of the VRAM, so the 24B model doesn't quite fit, though I did give it a try. It overspilled to the system's DDR3 and crawled, though it probably wouldn't have performed at a usefully "interactive speed" if it had fitted.

Great Success

I've been so impressed with Bazzite in all this.

By solving the issues of gaming (probably the most finicky of software anyway) they've seemingly solved many Linux desktop issues in the process. I recently built a new system for my Dad, who was not enjoying the modern Windows experience, and it was in researching alternatives for that computer I discovered it. He's had the machine for a week now, used it every day, and the only issue has been needing to add the -vulkan flag to Portal's launch options.

After playing with that machine for a couple of weeks while building it (which coincided with an update breaking Ubuntu on my Mac Pro) I also made the switch and haven't looked back. It's wonderful to have this great piece of hardware back in action properly.

Of course, hats off to the developers behind Ollama too, of course. Without that AVX extension requirement being fixed I'd have been completely scuppered.

Okay, I'd better get back to the climate and soil sensors now. Let's see if I can remember how to write in Groovy. Though if I forget, I now have an aluminium-clad brain-a-like sat next to me, and I'm only a button-press away from help.

Addendum

I've been informed that on Bluefin, another Universal Blue immutable operating system, your install could be as simple as:

brew install ramalama

I have given it a quick spin on Bazzite and everything gets to that "it's very nearly working" point, but it exits and doesn't serve the model. Full instructions are on Bluefin's "AI and Machine Learning" documentation page.

I have to say that I've also been impressed with how nicely documented the Universal Blue projects are. It really boosts my confidence in using something when I just keep hitting the answers to my questions right there on the official docs. Nice work.

← Recent Articles
August 29, 2025