ollama and n8n on a proxmox homelab

i run a small homelab. three lenovo thinkcentre m920q boxes, one with 32gb of ram and two with 16gb, all sitting behind a unifi network in my office. one of them now runs ollama and n8n, which means i can build local ai workflows without sending a single token to a vendor.

this is the build log for getting there: the install, the ram cap that ate the first model run, the model shortlist that fits cpu inference, the n8n ↔ ollama wiring detail the docs gloss over, what the first prompts actually returned, and an honest what now? at the end. i’m writing as i go, so the parts where i don’t yet know what to do with the stack are left in.

the setup, briefly

a 32gb lenovo m920q is the one box that runs ollama and n8n. it sits on a dedicated proxmox vlan, separate from the household network, with caddy in front terminating tls (let’s encrypt via the dns-01 challenge, since none of this is publicly reachable). two other m920qs handle home assistant os and proxmox backup server, so the 32gb isn’t shared.

each of those layers is getting its own post — proxmox as a homelab host, lxc vs vm, the unifi topology, and the caddy + dns-01 + wildcard-dns trick that puts tls on internal services. this one stays on ollama and n8n.

why n8n and ollama

what i actually want is to play with agent orchestration and build workflows with a visual surface so i can see what’s happening. local-first, because i don’t want to pay a vendor every time i want to test an idea, and because the privacy of running it on hardware i own is a feature, not a side effect.

there are options in both columns. on the workflow side: flowise, langflow, dify, activepieces. on the model side: lm studio, localai, llama.cpp directly. i didn’t run a comparison matrix. i picked n8n and ollama because both had ready-made proxmox helper scripts and zero subscription cost — the fastest path to a working setup i could poke at and, if it didn’t stick, throw away.

that’s the methodology, not laziness. the way i learn a new domain is to get something running, find the wall, fix it, find the next wall. survey articles are useful after you’ve felt the shape of the problem, not before. if n8n turns out to be wrong for what i want, the proxmox/lxc/dns/caddy plumbing carries over, and the lessons about model size on cpu inference stay valid no matter what front-end i swap in.

one constraint shapes everything that follows: the m920q has no usable gpu for inference, so this is a cpu-only stack. that pushes the design toward async workflows — scheduled jobs, things that run in the background — and away from real-time chat. if you want snappy interactive llm chat, this is not the build. if you want something that sits in the background and chews on stuff for you, it is.

the install: lean on community-scripts

the proxmox community helper scripts are genuinely good. they automate spinning up an lxc with a ready-to-run service: ollama, n8n, home assistant, most things you’d want in a homelab. one curl command on the proxmox host, answer a few prompts, working container.

i ran two of them: the ollama script (drops ollama into its own lxc, port 11434) and the n8n script (drops n8n into its own lxc, port 5678). separate containers on purpose. one service per lxc means failures stay isolated, and pct destroy <ctid> followed by re-running the script gets me back to a clean state in about ninety seconds. that “throw it away and start over” cycle is the actual value of the helper-script pattern. you don’t need to baby a container; if you broke it experimenting, just rebuild.

i’m not retracing the install steps here. the helper-script docs are good and they update faster than any article would. the part the docs don’t help with is what comes next: making the services reachable on my lan, dealing with the lxc ram cap, and getting models that actually run.

the ram cap that ate my first model run

i pulled llama3.1:8b, ran it, and ollama crashed:

Model requires more system memory (4.8 GiB) than is available (544.7 MiB).

the host has 32gb. how is there 544mb free?

because the lxc isn’t the host. the helper script provisions the ollama container with a small default memory cap — sane for an idle container, nowhere near enough to load an 8b model. the host sees 32gb. the container sees a slice. ollama, running inside the container, sees the slice.

fix is a one-liner from the proxmox host:

pct set <ctid> --memory 16384 --swap 4096
pct reboot <ctid>

16gb gives an 8b model (~5gb at q4) generous headroom for context and kv cache. swap is a safety net, not a workload tier — if you’re hitting it in normal use, the model is too big for the box.

the lesson generalizes. lxc resource caps are silent until they’re not. when something inside a container fails with a “not enough memory” or “too many open files” error and the host clearly has plenty, the answer is almost always pct config <ctid>.

models: a starter shortlist

cpu-only inference on the m920q gets you about 5 tokens/sec on 7–8b models and 10 tokens/sec on a 3b — measured with ollama run --verbose on a one-paragraph prompt. fast enough that async workflows don’t stall, slow enough that real-time chat is out. that’s not chatgpt; that’s kick off a workflow, do something else, come back. once you accept that frame, the model picks fall out:

qwen2.5:7b (~4.4 gb at q4, ~5.2 tok/s) — the recommended starter. better at structured json output than llama, which matters because n8n’s ai nodes lean on tool-calling. faster too, on this hardware.
llama3.1:8b (~4.7 gb at q4, ~4.8 tok/s) — the recognized baseline. keep it on hand to sanity-check whether a problem is the model or the workflow.
llama3.2:3b (~2 gb at q4, ~10.2 tok/s) — fast and small. classification, routing, summarization steps where latency matters.
nomic-embed-text (~270 mb) — embedding model. the moment you try a vector-store node, you’ll need it. cheap to keep around.

pull them on the ollama lxc:

ollama pull qwen2.5:7b
ollama pull llama3.1:8b
ollama pull llama3.2:3b
ollama pull nomic-embed-text

what’s not on the list:

13–14b models. they fit in 32gb but crawl on cpu. revisit if a gpu joins the rack.
70b. won’t fit usefully even at q4.
coding-specific models like qwen2.5-coder. useful, but second-wave. start with generalists.

the deeper move is to stop thinking about which model is “best” and start matching model size to step. a workflow that summarizes an rss feed every morning doesn’t need an 8b model — a 3b will do it in a third of the time, and n8n lets you pick a different model per node. use that.

wiring n8n to ollama

the gotcha that costs everyone an hour the first time: in n8n’s ollama credential, don’t use the public hostname. https://ollama.homelab.domain.tld round-trips through caddy and tls for no reason — both lxcs are on the same vlan.

use the ollama lxc’s lan ip directly:

http://<ollama-lxc-ip>:11434

http, not https. inside the vlan. caddy is for browsers, not for service-to-service.

browser path — caddy is in the way on purpose:

  browser
     │   https
     ▼
  unifi dns
     │   matches *.homelab.domain.tld → caddy ip
     ▼
  caddy lxc
     │   tls + routes by hostname
     ▼
  n8n lxc :5678   ─or─   ollama lxc :11434


service-to-service — caddy must NOT be in the way:

  n8n lxc
     │   http, ollama's lan ip
     ▼
  ollama lxc :11434

if you want stable container ips — and you do; the alternative is updating n8n credentials every time a container reboots — set static ip on each lxc at create time, or reserve dhcp leases on the unifi side. helper scripts default to dhcp, which is fine until an address shifts and your workflows quietly break.

once the credential validates, n8n’s ai nodes can target ollama models by name — llama3.1:8b, qwen2.5:7b, etc. the strings are the same ones you ollama pulled; nothing to translate.

the n8n license, briefly

n8n offered a free license key when i created my account. i grabbed it because it was offered and free. it’s a community-edition license — fully self-hosted, no data leaves the box, no subscription. it unlocks a handful of quality-of-life features the unlicensed community build doesn’t ship with; i’ll find out which ones actually matter as i use it.

if you skipped registering on signup, you can grab a key from n8n’s site later and paste it into the settings.

first contact

i built the smallest possible workflow to validate the wiring: a manual trigger, a Message a model node, qwen2.5:7b selected, click execute.

three things happened in sequence, all worth recording.

empty prompt → chinese. the first run, i hadn’t filled in the user content yet. qwen 2.5 is alibaba-trained, and its default behavior with a null user prompt is to greet in chinese: “您好！有什么可以帮忙的吗？…” useful confirmation that the wiring works end-to-end, and a reminder that “qwen” and “english-default” are not the same thing.

system message → english. added a system message under the node’s options: Always respond in English. re-ran. the response came back as a generic english greeting (the user content was still empty). system messages override training defaults.

real prompt → plausibly wrong. finally filled in user content: “in two sentences, explain what proxmox is to someone who knows docker.” qwen replied:

“Proxmox is an open-source virtualization platform that integrates LXC and KVM with Docker on a single server, allowing you to run both containerized and full virtual machines alongside each other. It provides a web interface and command-line tools similar to Docker for managing containers and VMs.”

confident. fluent. wrong. proxmox does not integrate docker. it ships with lxc (a different container runtime) and kvm; docker is something you install inside a proxmox container yourself. the model anchored on “docker” from the prompt and reached for it as if it were a native feature of the system it was describing.

timings: 13.7s end-to-end, 59 generated tokens, ~5.2 tok/s — consistent with the model section’s baseline. the content was the problem, not the speed.

comparison run. swapped the model dropdown to llama3.1:8b, kept everything else identical, executed again:

“Proxmox is a server virtualization platform that allows you to run multiple virtual machines (VMs) on a single physical host, similar to how Docker containers are used to isolate applications from each other but at the hypervisor level. Think of Proxmox as a higher-level abstraction than Docker, allowing you to run full-fledged VMs with their own operating systems and resources, whereas Docker is more focused on containerization within an existing OS.”

llama’s answer is not fabricated the way qwen’s was — no invented docker integration. but it’s incomplete: it frames proxmox as a vm-only platform and never mentions lxc, which is half of what proxmox actually does. qwen invented a feature that doesn’t exist. llama omitted a feature that does. both fluent, both confident, both wrong in different ways.

(timings: 25s, 91 tokens, ~5.0 tok/s. llama generated 91 tokens to qwen’s 59 on the same prompt — almost twice the wall-clock time. verbosity is a hidden cost at this tok/s tier. worth knowing when you’re picking a model per workflow node.)

that’s the lesson the article has been pointing at. at this size, on this hardware, a local model is a fluent generator with no native ground truth. the wiring worked. the verification is on you.

what’s next

i don’t have a killer workflow yet. the honest answer is i set this up to learn, and i’m now in the part of learning where you have to actually use the thing.

candidates rolling around:

rss triage — scrape a feed, summarize each item with a 3b model, surface the interesting ones.
journal summarizer — on a schedule, ingest the week’s notes and surface themes and unfinished threads.
home assistant tie-in — pv1 already runs home assistant. there’s an obvious bridge between events on that side and an ai workflow that decides what to do with them.

if any of those land, they’ll show up here as their own write-up. if they all turn out to be busywork, that’ll be a write-up too.

a build log doesn’t have to end with a hero shot. sometimes the result is just the rig is ready. that’s where this one stops.