Local LLM Agents (ITAR / CMMC)
A working recipe for driving O.D.I.N. with AI agents without a single outbound packet — no Anthropic API, no OpenAI, no cloud. Every prompt, tool call, and response stays inside your compliance boundary.
This guide pairs ITAR / CMMC mode on the backend with a locally-hosted LLM and the MCP server on the agent host.
Topology
┌─ Operator workstation (air-gapped LAN) ──────────────────────┐
│ │
│ Ollama / llama.cpp Agent client │
│ (Qwen2.5-32B-Instruct) ◄──► (Claude Desktop / Cline / │
│ │ OpenClaw / Continue) │
│ │ │ │
│ │ │ stdio │
│ │ ▼ │
│ │ odin-print-farm-mcp@2 │
│ │ │ │
│ └──────────────────────────────┤ HTTP │
│ ▼ │
│ O.D.I.N. backend │
│ (ODIN_ITAR_MODE=1) │
│ │ │
│ ▼ LAN │
│ Printers │
│ │
└──────────────────────────────────────────────────────────────┘
Zero packets leave this box.
Hardware baseline
| Component | Minimum | Comfortable |
|---|---|---|
| GPU VRAM | 24 GB (single 3090/4090) | 48 GB (2× 3090) |
| System RAM | 32 GB | 64 GB |
| Disk | 50 GB for model weights | 200 GB (multiple models) |
| CPU | 8 cores | 16 cores |
A 32B-parameter model at Q4 quantization fits on one 24 GB card and runs ~20–30 tokens/sec. That's enough for one operator driving the farm interactively.
Model selection
| Model | Params | Quant | VRAM | Notes |
|---|---|---|---|---|
| Qwen2.5-32B-Instruct | 32 B | Q4_K_M | ~20 GB | Recommended. Strong tool use, solid planning, Apache 2.0. |
| Qwen2.5-72B-Instruct | 72 B | Q4_K_M | ~44 GB | Better reasoning; needs 2 GPUs. |
| Llama 3.3 70B-Instruct | 70 B | Q4_K_M | ~42 GB | Meta license, same footprint class as Qwen 72B. |
| Mistral-Small-Instruct-2409 | 22 B | Q5_K_M | ~16 GB | Fits on a 3090 with overhead for other tasks. |
| Qwen2.5-Coder-32B-Instruct | 32 B | Q4_K_M | ~20 GB | Best if you also want coding help on the same box. |
Avoid sub-14B models for driving writes — they hallucinate tool arguments. Use them only for reads (farm_summary, list_jobs) or keep them in dry-run mode.
Install Ollama
curl -fsSL https://ollama.com/install.sh | sh
ollama pull qwen2.5:32b-instruct-q4_K_M
ollama serve # listens on 127.0.0.1:11434
Verify:
curl http://127.0.0.1:11434/api/tags
For an air-gapped install, pull the model on a connected staging box, copy ~/.ollama/models/ to the target, then ollama serve with no network.
Wire the MCP server
Mint an agent:write token in O.D.I.N. (Settings → API Tokens → New Token). Then configure your MCP client.
Claude Desktop
~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):
{
"mcpServers": {
"odin": {
"command": "npx",
"args": ["-y", "odin-print-farm-mcp@2"],
"env": {
"ODIN_BASE_URL": "http://127.0.0.1:8000",
"ODIN_API_KEY": "odin_xxxxxxxxxxxxxxxxxxxx"
}
}
}
}
Claude Desktop with a local model backend (via proxies like LiteLLM or LM Studio) keeps the inference on-box. If your threat model forbids Anthropic even for the agent loop, use a local-first client instead:
Continue (VS Code)
.continue/config.json:
{
"models": [
{
"title": "Qwen2.5 32B",
"provider": "ollama",
"model": "qwen2.5:32b-instruct-q4_K_M"
}
],
"mcpServers": {
"odin": {
"command": "npx",
"args": ["-y", "odin-print-farm-mcp@2"],
"env": {
"ODIN_BASE_URL": "http://127.0.0.1:8000",
"ODIN_API_KEY": "odin_xxxxxxxxxxxxxxxxxxxx"
}
}
}
}
Cline / OpenClaw
Same mcpServers block pasted into their settings UI. Both drive local Ollama models directly — no external API traffic.
Verify the air gap
Before putting this into production, prove no packets leave the box.
1. Packet capture
sudo tcpdump -i any -n 'not (src net 192.168.0.0/16 or src net 10.0.0.0/8 or src net 127.0.0.0/8)' -w /tmp/odin-egress.pcap
Drive a full agent loop — read the dashboard, queue a job, cancel it, approve another. Stop the capture. The file should be empty (or only contain broadcast/mDNS noise, nothing to public addresses).
2. ITAR boot audit
docker compose logs odin | grep ITAR
Expected line:
ODIN_ITAR_MODE=1 boot audit passed — all configured URLs are private.
If the container refused to boot, the log lists every violating URL. Fix each one (strip public SMTP, webhooks, OIDC providers, etc.) and restart.
3. Blocked-egress smoke test
Add a public webhook in Settings → Webhooks → New Webhook (URL: https://example.com/hook). O.D.I.N. refuses to save it. In ODIN_ITAR_MODE=0 deployments the save succeeds — that's your sanity check that enforcement is actually on.
4. Firewall backstop
Defense-in-depth: block outbound at the network layer too. Example nftables:
table inet odin_filter {
chain output {
type filter hook output priority filter; policy drop;
oif lo accept
ip daddr 192.168.0.0/16 accept
ip daddr 10.0.0.0/8 accept
ip daddr 172.16.0.0/12 accept
ct state established,related accept
}
}
If the app-layer guard misses anything, the network layer catches it.
Prompting agents for safe operation
Local 32B models are good but not great at planning. Two tricks go a long way:
Force dry-run first. In the system prompt: "For any write operation (queue_job, cancel_job, pause_printer, etc.), always call with dry_run: true first. Show me the preview. Only call without dry_run after I explicitly confirm."
Use structured error codes. When a tool errors, don't retry blindly. The system prompt should say: "If a tool returns error.code = 'scope_denied', stop and ask for a broader token. If quota_exceeded, stop and report the quota. Never retry a validation_failed error — fix the args."
These two rules eliminate ~90% of "helpful agent destroys the print queue" failure modes.
CMMC Level 2 considerations
- Audit logging. O.D.I.N. writes every API call (including agent tool calls) to the
audit_logtable with actor, IP, action, and timestamp. Export to a SIEM via the/admin/auditendpoint or the Postgres replica. - MFA for operators. Enable TOTP MFA on every human account. Agent tokens are separate — mint them with minimal scopes and rotate.
- Scope separation. Use
agent:readfor reporting agents andagent:writefor operational agents. Don't share tokens between. - Session timeout. Configure short
session_timeout_minutesin settings. Agent tokens have their ownexpires_at— set it to the minimum usable window (7–30 days). - Local license verification. O.D.I.N. licenses verify locally via Ed25519. No phone-home, works in air-gap.
See Also
- ITAR / CMMC Mode — backend fail-closed enforcement.
- MCP Server — tool catalog and client reference.
- API Tokens — scopes, expiry, rotation.
- MFA — operator auth.