Skip to main content

Local LLM Agents (ITAR / CMMC)

A working recipe for driving O.D.I.N. with AI agents without a single outbound packet — no Anthropic API, no OpenAI, no cloud. Every prompt, tool call, and response stays inside your compliance boundary.

This guide pairs ITAR / CMMC mode on the backend with a locally-hosted LLM and the MCP server on the agent host.


Topology

┌─ Operator workstation (air-gapped LAN) ──────────────────────┐
│ │
│ Ollama / llama.cpp Agent client │
│ (Qwen2.5-32B-Instruct) ◄──► (Claude Desktop / Cline / │
│ │ OpenClaw / Continue) │
│ │ │ │
│ │ │ stdio │
│ │ ▼ │
│ │ odin-print-farm-mcp@2 │
│ │ │ │
│ └──────────────────────────────┤ HTTP │
│ ▼ │
│ O.D.I.N. backend │
│ (ODIN_ITAR_MODE=1) │
│ │ │
│ ▼ LAN │
│ Printers │
│ │
└──────────────────────────────────────────────────────────────┘

Zero packets leave this box.


Hardware baseline

ComponentMinimumComfortable
GPU VRAM24 GB (single 3090/4090)48 GB (2× 3090)
System RAM32 GB64 GB
Disk50 GB for model weights200 GB (multiple models)
CPU8 cores16 cores

A 32B-parameter model at Q4 quantization fits on one 24 GB card and runs ~20–30 tokens/sec. That's enough for one operator driving the farm interactively.


Model selection

ModelParamsQuantVRAMNotes
Qwen2.5-32B-Instruct32 BQ4_K_M~20 GBRecommended. Strong tool use, solid planning, Apache 2.0.
Qwen2.5-72B-Instruct72 BQ4_K_M~44 GBBetter reasoning; needs 2 GPUs.
Llama 3.3 70B-Instruct70 BQ4_K_M~42 GBMeta license, same footprint class as Qwen 72B.
Mistral-Small-Instruct-240922 BQ5_K_M~16 GBFits on a 3090 with overhead for other tasks.
Qwen2.5-Coder-32B-Instruct32 BQ4_K_M~20 GBBest if you also want coding help on the same box.

Avoid sub-14B models for driving writes — they hallucinate tool arguments. Use them only for reads (farm_summary, list_jobs) or keep them in dry-run mode.


Install Ollama

curl -fsSL https://ollama.com/install.sh | sh

ollama pull qwen2.5:32b-instruct-q4_K_M
ollama serve # listens on 127.0.0.1:11434

Verify:

curl http://127.0.0.1:11434/api/tags

For an air-gapped install, pull the model on a connected staging box, copy ~/.ollama/models/ to the target, then ollama serve with no network.


Wire the MCP server

Mint an agent:write token in O.D.I.N. (Settings → API Tokens → New Token). Then configure your MCP client.

Claude Desktop

~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
"mcpServers": {
"odin": {
"command": "npx",
"args": ["-y", "odin-print-farm-mcp@2"],
"env": {
"ODIN_BASE_URL": "http://127.0.0.1:8000",
"ODIN_API_KEY": "odin_xxxxxxxxxxxxxxxxxxxx"
}
}
}
}

Claude Desktop with a local model backend (via proxies like LiteLLM or LM Studio) keeps the inference on-box. If your threat model forbids Anthropic even for the agent loop, use a local-first client instead:

Continue (VS Code)

.continue/config.json:

{
"models": [
{
"title": "Qwen2.5 32B",
"provider": "ollama",
"model": "qwen2.5:32b-instruct-q4_K_M"
}
],
"mcpServers": {
"odin": {
"command": "npx",
"args": ["-y", "odin-print-farm-mcp@2"],
"env": {
"ODIN_BASE_URL": "http://127.0.0.1:8000",
"ODIN_API_KEY": "odin_xxxxxxxxxxxxxxxxxxxx"
}
}
}
}

Cline / OpenClaw

Same mcpServers block pasted into their settings UI. Both drive local Ollama models directly — no external API traffic.


Verify the air gap

Before putting this into production, prove no packets leave the box.

1. Packet capture

sudo tcpdump -i any -n 'not (src net 192.168.0.0/16 or src net 10.0.0.0/8 or src net 127.0.0.0/8)' -w /tmp/odin-egress.pcap

Drive a full agent loop — read the dashboard, queue a job, cancel it, approve another. Stop the capture. The file should be empty (or only contain broadcast/mDNS noise, nothing to public addresses).

2. ITAR boot audit

docker compose logs odin | grep ITAR

Expected line:

ODIN_ITAR_MODE=1 boot audit passed — all configured URLs are private.

If the container refused to boot, the log lists every violating URL. Fix each one (strip public SMTP, webhooks, OIDC providers, etc.) and restart.

3. Blocked-egress smoke test

Add a public webhook in Settings → Webhooks → New Webhook (URL: https://example.com/hook). O.D.I.N. refuses to save it. In ODIN_ITAR_MODE=0 deployments the save succeeds — that's your sanity check that enforcement is actually on.

4. Firewall backstop

Defense-in-depth: block outbound at the network layer too. Example nftables:

table inet odin_filter {
chain output {
type filter hook output priority filter; policy drop;
oif lo accept
ip daddr 192.168.0.0/16 accept
ip daddr 10.0.0.0/8 accept
ip daddr 172.16.0.0/12 accept
ct state established,related accept
}
}

If the app-layer guard misses anything, the network layer catches it.


Prompting agents for safe operation

Local 32B models are good but not great at planning. Two tricks go a long way:

Force dry-run first. In the system prompt: "For any write operation (queue_job, cancel_job, pause_printer, etc.), always call with dry_run: true first. Show me the preview. Only call without dry_run after I explicitly confirm."

Use structured error codes. When a tool errors, don't retry blindly. The system prompt should say: "If a tool returns error.code = 'scope_denied', stop and ask for a broader token. If quota_exceeded, stop and report the quota. Never retry a validation_failed error — fix the args."

These two rules eliminate ~90% of "helpful agent destroys the print queue" failure modes.


CMMC Level 2 considerations

  • Audit logging. O.D.I.N. writes every API call (including agent tool calls) to the audit_log table with actor, IP, action, and timestamp. Export to a SIEM via the /admin/audit endpoint or the Postgres replica.
  • MFA for operators. Enable TOTP MFA on every human account. Agent tokens are separate — mint them with minimal scopes and rotate.
  • Scope separation. Use agent:read for reporting agents and agent:write for operational agents. Don't share tokens between.
  • Session timeout. Configure short session_timeout_minutes in settings. Agent tokens have their own expires_at — set it to the minimum usable window (7–30 days).
  • Local license verification. O.D.I.N. licenses verify locally via Ed25519. No phone-home, works in air-gap.

See Also