Sovereign / Local-First AI • Atlas • Guide • Library

Sovereign Local AI Atlas

A practical, readable map of the modern local-AI stack — from inference engines to UIs, RAG + memory, agents + MCP tooling, and security + evaluation. Every tool includes inline links where it appears, plus hardening notes for high-risk surfaces.

Sovereign Core FOSS, self-hostable, offline-capable
Local-Closed local/offline, but closed source
Hybrid local-capable, cloud gravity
Synthetic-Edge inherently cloud/SaaS boundary
Last updated: 2026-03-04 Format: single-file HTML + CSS Links: inline at point-of-use

1) How to read this atlas

The atlas is intentionally structured as layers. The correct way to use it is to build “downward”: choose your inference engine → expose a local API → attach UIs → attach data stores → add RAG/memory → add agents/tools (MCP) → instrument → attack/test.

Principle: Local ≠ Safe

“Self-hosted” tools can still be exploited via misconfiguration, exposed ports, malicious tool servers, or supply chain. This atlas therefore includes security notes and scanner tooling inline, not as an appendix.

Trust tiers are operational constraints

  • Sovereign Core: safe defaults possible; strong foundation.
  • Hybrid: safe only with explicit configuration + network restrictions.
  • Synthetic-Edge: treated as a boundary — isolate credentials, memory, logs, and network.

Recommended reading approach

  • Read each layer’s “Why it exists” paragraph.
  • Pick 1–2 tools per layer first (avoid tool sprawl).
  • Only then consider optional expansions.

2) Upstream “source feeds” (audited lists)

These are discovery surfaces we mined and then filtered. They are useful for breadth, but none are treated as authoritative without verification.

Awesome Local AI (ethicals7s)

Large “local AI” index; high recall; requires re-verification of locality claims.

Source Feed

Awesome Local LLM (rafska)

Strong taxonomy across local LLM engines, UIs, RAG, agents, and more.

Source Feed

Jan’s awesome-local-ai

Local tooling list with excellent coverage of inference engines and local-first apps.

Source Feed

Awesome Production Agentic Systems (EthicalML)

Ops + scaling + monitoring + memory + security tools for production agentic systems.

Source Feed

Awesome MCP Servers (punkpeye)

Huge catalog of MCP tool servers; must be filtered hard by locality & blast radius.

Source Feed

Awesome Code AI (Sourcegraph) — archived

Historic snapshot of coding tools; archived Feb 23, 2026 (read-only).

Source Feed

3) Layer: Inference & Runtime

This layer answers: Where do tokens come from? You need at least one inference engine plus a stable interface that higher layers can consume. In practice that means:

  • Engine (llama.cpp / vLLM / SGLang / etc.)
  • API surface (LocalAI / vLLM OpenAI-compatible server / etc.)
  • Runtime governance (ports, auth, patching cadence, model provenance)

Inference Engines (run the model)

llama.cpp

Minimal C/C++ inference for GGUF models; runs across CPU/GPU; ideal baseline engine.

Sovereign Core Airgap-ready
Why it’s here + how to use it
Role
Baseline local inference engine; also provides an OpenAI-compatible HTTP server in examples.
Best for
Laptops, workstations, minimal deployments, GGUF workflows, tight control.
Notes
Pin a known-good version and treat model files as artifacts with hashes.

vLLM

High-throughput, memory-efficient inference & serving engine; production GPU workhorse.

Sovereign Core OpenAI-compatible server
Why it’s here + how to use it
  • Use as your internal “model server” behind a reverse proxy/VPN.
  • Expose OpenAI-style endpoints where possible to maximize compatibility.
  • Pair with LocalAI or LiteLLM if you need a unified gateway across multiple backends.

SGLang

High-performance serving framework for LLMs + multimodal; strong for structured/tool outputs.

Sovereign Core Structured outputs
Why it’s here + how to use it
  • Pick SGLang when agent tool-use and JSON contracts matter.
  • Works from single GPU to distributed clusters; treat it like vLLM (model server layer).

MLC-LLM

Compiler + deployment engine for running models natively across devices (including browser/edge).

Sovereign Core Edge
Where it fits

Use when you want sovereignty to extend to phones/devices/browsers without centralized servers.

MLX (Apple Silicon)

Apple’s array framework for ML on Apple Silicon; pairs with MLX-LM for local LLM work.

Sovereign Core Apple

OpenAI-Compatible Local APIs (make everything plug in)

LocalAI

Open-source OpenAI-compatible REST API for local inference (text, images, audio).

Sovereign Core Airgap-ready
Why it matters
  • Acts as your “drop-in OpenAI replacement” inside the sovereign perimeter.
  • Lets UIs, IDE plugins, RAG frameworks, and agents reuse the same API shape without cloud keys.

Ollama

Local model runner + API; convenient for desktops and small servers — but must be network-hardened.

Sovereign Core Sandbox-required
Hardening checklist (read before deploying)
  • Do not expose to the public internet (large-scale misconfiguration incidents have been documented).
  • Bind to localhost or place behind VPN/reverse proxy with auth.
  • Keep tool-calling / file access out of the model server unless sandboxed.

LiteLLM

Proxy/gateway: routes many model providers through a unified OpenAI-style interface (local + cloud).

Hybrid Cloud gravity
Sovereign usage pattern
  • Use LiteLLM as an internal router where local endpoints are default.
  • Disable cloud backends unless you explicitly create a “Synthetic-Edge” profile.

Optional performance stacks (use only if you need them)

TensorRT-LLM (NVIDIA)

High-performance inference optimizations for LLMs on NVIDIA GPUs.

Hybrid Vendor gravity

Triton Inference Server (NVIDIA)

General-purpose inference server for many frameworks (TensorRT, PyTorch, ONNX, etc.).

Hybrid Patch discipline
Security note

Treat Triton as high-value infra. Keep it internal-only and patched; enterprise-grade servers attract enterprise-grade attacks.

4) Layer: Human Interfaces (UIs)

UIs are deceptively dangerous: they hold tokens/keys, render untrusted content, and often provide “direct connections” to external model servers. Treat UIs like you would treat your admin panels.

General chat/workbench UIs

Open WebUI

Self-hosted AI platform designed to operate offline; supports Ollama + OpenAI-compatible APIs.

Sovereign Core Patch-required
Security-critical note (read)

CVE-2025-64496 describes a Direct Connections code-injection vulnerability via SSE events that can lead to token theft, account takeover, and—when chained—backend RCE. Patch + reduce attack surface.

Text Generation WebUI (oobabooga)

Feature-rich local AI web UI; supports GGUF via llama.cpp and many backends.

Sovereign Core Airgap-ready
Usage note
  • Prefer local-only bindings; avoid exposing the UI publicly.
  • Use as a “lab bench” UI when you need deep model controls (LoRAs, extensions, etc.).

AnythingLLM

Full-stack “private ChatGPT” for chatting with docs; supports many local LLMs + vector DBs.

Sovereign Core RAG suite
Where it fits

Use AnythingLLM when you want an integrated RAG workspace with minimal wiring. Treat it as a UI layer over your own local endpoints.

Jan (desktop)

Open-source ChatGPT alternative aimed at fully local/offline usage with strong privacy posture.

Sovereign Core Airgap-ready

LM Studio

Local model UI + local server; can operate entirely offline (closed-source app).

Local-Closed Offline
Sovereign stance

Useful as a convenience interface on individual machines. Do not make it a single point of failure or the only ingress to your models.

Developer/coding UIs

Continue (IDE extension)

Open-source IDE assistant; point it at LocalAI/vLLM/Ollama for local-first coding workflows.

Sovereign Core
How to keep it local
  • Configure model provider endpoints to your internal OpenAI-compatible URL (LocalAI/vLLM/etc.).
  • Disable “cloud models” in profiles unless explicitly using a Synthetic-Edge environment.

Aider (CLI pair programmer)

Terminal-based AI pair programming; edits your repo through git-friendly patches.

Sovereign Core Repo-scoped

UI security notes (non-optional reading)

Open WebUI — Direct Connections CVE (code injection → takeover)

Direct Connections lets Open WebUI connect to external model servers. CVE-2025-64496 documents a code injection route via SSE events that can steal auth tokens and lead to takeover and backend RCE when chained.

Ollama — massive exposure incidents (misconfiguration)

Security researchers documented ~175,000 publicly exposed Ollama hosts due to unsafe network binding. Treat model servers as internal services; never expose raw endpoints.

5) Layer: Data, Vectors, RAG

This layer answers: Where does context come from? You need (a) storage (structured/unstructured), (b) embeddings/vector search, (c) retrieval pipelines, and (d) document ingestion.

Vector stores (local-first)

Qdrant

Open-source vector DB + similarity search (Rust); self-hostable, scalable.

Sovereign Core

pgvector (PostgreSQL extension)

Vector similarity search inside Postgres; keep vectors with the rest of your data.

Sovereign Core Single DB

DuckDB

In-process SQL OLAP DB (MIT); ideal for local analytics and embedded pipelines.

Sovereign Core Embedded

LanceDB

OSS embedded retrieval + vector search for multimodal AI data; SQL + vectors.

Sovereign Core Embedded

Pinecone

Fully managed, serverless vector DB (cloud). Included for boundary awareness only.

Synthetic-Edge
Why it’s here

It’s widely used in RAG stacks; we include it only to explicitly mark “cloud vector DB” as a Synthetic boundary option.

RAG frameworks (local-capable, but must be constrained)

LangChain

Popular orchestration + RAG framework; huge integrations ecosystem (local + cloud).

Hybrid Disable hosted services
Sovereign usage pattern
  • Route model calls to LocalAI / vLLM / SGLang / Ollama endpoints only.
  • Use local vector stores (pgvector/Qdrant/Milvus/LanceDB).
  • Avoid hosted tracing or cloud default configs unless explicitly segmented.

LangGraph

Stateful agent/workflow graphs (LangChain ecosystem); ideal for long-running, controllable flows.

Hybrid Graph workflows

LlamaIndex

RAG & data framework; strong indexing abstractions; supports LocalAI via OpenAI-compatible interface.

Hybrid

Haystack

Open-source orchestration framework for RAG and agents; modular and transparent pipelines.

Sovereign Core

PrivateGPT (Zylon)

Production-ready private document Q&A that can run without internet; “no data leaves environment”.

Sovereign Core Offline-first

6) Layer: Memory (agent persistence)

Memory is not “chat history.” It’s a system: what gets stored, how it’s retrieved, how it’s updated, and how it’s prevented from becoming an injection vector.

Canonical memory index

The most comprehensive resource index we audited: IAAR-Shanghai / Awesome-AI-Memory. Use it as the research+pattern map; keep implementations inside your own perimeter.

Mem0

“Universal memory layer” for agents; stores and recalls personalized context across sessions.

Hybrid Self-host review
Sovereign stance
  • Use only when fully self-hosted with local DBs and controlled egress.
  • Keep memory writes gated; run red-team tests against memory recall paths.

MemOS

Memory Operating System: unified store/retrieve/manage for long-term agent memory.

Sovereign Core System-level memory

Zep

Context engineering & memory platform; assembles relevant context from histories and data sources.

Hybrid Self-host required

Graphiti (knowledge-graph memory)

Temporally-aware knowledge graphs for agents in dynamic environments (memory beyond vectors).

Sovereign Core Graph memory

Memory threat model (always apply)

  • Prompt injection via stored memory: malicious content can persist and re-trigger later.
  • Cross-tenant contamination: if multi-user, enforce hard boundaries (DB row-level policies, separate indices, separate keys).
  • Unbounded growth: require summarization/compression/expiry policies.

7) Layer: Agents & Orchestration

Agents are where systems become dangerous: they decide, call tools, write files, run code, and mutate state. The job of this layer is to make that power bounded, observable, and attack-tested.

LangGraph

Stateful workflow graphs for long-running agents; good for explicit control and failure boundaries.

Hybrid Local-capable

CrewAI

Lean multi-agent framework; role-based orchestration with high-level + low-level control.

Hybrid Cloud examples exist
Sovereign use
  • Point all LLM calls to your local API (LocalAI/vLLM/SGLang).
  • Tooling via MCP should be sandboxed; do not grant host filesystem by default.

AutoGen (Microsoft)

Multi-agent framework for autonomous or human-in-the-loop workflows.

Hybrid Local-capable

Agent rule: no direct host access

Agents should operate in containers/microVMs with scoped filesystem mounts and default-deny egress. If you must allow tool access, do it via MCP servers that enforce least privilege (see next layer).

8) Layer: MCP (Tools) + Sandbox

MCP (Model Context Protocol) is the “tool port” that connects models to resources and actions. This layer exists because the moment an agent can use tools, it can exfiltrate, corrupt, and escalate. Treat MCP servers like plugins with code execution risk.

Official MCP specification

Start here for protocol-level grounding: modelcontextprotocol.io and the specification.

MCP Inspector

Visual testing + proxy tool to run and debug MCP servers safely during development.

Sovereign Core Dev tooling

FastMCP (Python)

Pythonic way to build MCP servers; widely used in MCP ecosystems.

Sovereign Core

Microsandbox

Self-hosted microVM sandbox for running untrusted workloads fast with strong isolation.

Sovereign Core Sandbox-required
Why it matters

Sandboxing is your “blast radius limiter” for agent code execution, browser tools, file conversion, and data analysis.

mcp-scan

Security scanner for MCP servers (prompt injection, tool poisoning, escalation patterns).

Sovereign Core Security

MCP policy you can actually enforce

  • Local-first MCP: filesystem + DB + internal services only.
  • Synthetic-edge MCP: cloud SaaS servers live in separate profiles/VMs; no shared memory/logging.
  • Always scan new MCP servers (mcp-scan) before enabling.
  • Always sandbox servers that execute code, browse web, or parse untrusted documents.

Discovery feed (MCP servers)

Massive catalog: punkpeye/awesome-mcp-servers. Use it only after applying the policy above.

9) Layer: Observability

Observability is not optional: agents are multi-step systems. If you cannot trace calls, tool usage, retrieval, and state transitions, you cannot prove what happened — and you cannot harden.

Langfuse

Open-source LLM observability + prompt management + eval; self-hostable.

Sovereign Core Telemetry default
Important note: disable telemetry

Langfuse states that by default it reports basic usage statistics of self-hosted instances to PostHog. For sovereign deployments, explicitly disable analytics and enforce with network egress controls.

Phoenix (Arize)

Open-source AI observability: tracing, evaluation, debugging for LLM apps and agents.

Sovereign Core Tracing

PostHog (self-hosted)

Open-source analytics platform; can be self-hosted and used as an internal metrics sink.

Sovereign Core Self-hosted
Why it’s here

If you use analytics at all, run it inside your own perimeter. Do not leak LLM traces or prompts to third-party SaaS by accident.

Production deployment helpers (optional)

When you need reproducible serving and scale, consider these production frameworks (self-hosted):

10) Layer: Evaluation & Security

This layer exists to answer: Can the system be tricked? You need continuous red-teaming, scanner tooling, and repeatable evaluation harnesses.

promptfoo

Evaluate prompts/agents/RAG; includes red teaming and vulnerability scanning workflows.

Sovereign Core Red team
How to use it in a sovereign stack
  • Attack your own prompts, tools, and RAG pipelines before attackers do.
  • Run in CI (PR checks) and scheduled jobs (regression tests).

Agentic Radar

Security scanner for agent workflows; generates reports on vulnerabilities and operational risks.

Sovereign Core

AI-Infra-Guard

Full-stack AI red teaming: infra scan, MCP scan, agent skills scan, jailbreak evaluation.

Sovereign Core

MCP-Scan

Scanner designed to audit installed MCP servers for prompt injection, poisoning, escalation patterns.

Sovereign Core

Security is a stack (not a checkbox)

  • UI security: patch CVEs; render untrusted content safely.
  • Model server security: no public exposure; auth; internal networks.
  • MCP security: sandbox + scan tool servers; least privilege; isolate cloud MCP.
  • Memory security: guard writes and retrieval; prevent injection persistence; partition tenants.
  • Continuous evaluation: promptfoo + workflow scanners; regression tests before releases.

11) Layer: Code AI (local copilot)

Coding AI is a special case: the data (your codebase) is extremely sensitive, and the tool has direct write access to production systems. This atlas therefore treats local, self-hostable code AI as default.

Tabby

Self-hosted AI coding assistant; open-source alternative to GitHub Copilot.

Sovereign Core
Typical wiring
  • Tabby runs as your completion backend.
  • Continue is the IDE front-end.
  • LocalAI/vLLM can be used for chat/refactor agents when desired.

Continue

IDE front-end; pair it with Tabby + LocalAI for a fully local “copilot stack”.

Sovereign Core

Aider

CLI refactor agent; keeps work repo-scoped and git-aware (diffs/commits).

Sovereign Core

Rule for code AI

If you use any cloud code assistant at all, do it in a separate, quarantined environment (no secrets, no prod keys, no canonical repos). Keep the sovereign codebase local-first.

Historic catalog (archived): sourcegraph/awesome-code-ai.

12) Layer: Multimodal & Speech

Sovereignty requires voice and vision to remain local as well. These tools provide offline speech-to-text, text-to-speech, and vision-language capability.

whisper.cpp

C/C++ port of Whisper for local speech-to-text; runs offline.

Sovereign Core Offline STT

Coqui TTS

Open-source deep learning TTS toolkit; inference + training/fine-tuning.

Sovereign Core TTS

LLaVA

Vision-language model project (NeurIPS 2023); local VLM building block.

Sovereign Core VLM

Moondream

Small vision-language model designed for edge efficiency and broad device support.

Sovereign Core Edge VLM

13) Reference Profiles (Minimal / Dev / Lab)

These profiles help you pick an appropriate complexity level. Each is a “small set of parts” drawn from the layers above.

Save & publish

Save this file as sovereign-local-ai-atlas.html and open it in any browser. If you want to host it, you can drop it into any static site host (GitHub Pages, Netlify, your own server).

Optional enhancements (still single-file)

  • Add a small JS search box to filter tool cards.
  • Add “copy link” buttons for section anchors.
  • Add printable styles with @media print.

End of document. Back to top ↑