LLM_Monitor
An LLM observability and orchestration platform implemented as a set of containerized microservices. A C#/.NET server acts as the edge/API gateway, forwarding user messages to a Python service built on LangChain/LangGraph that orchestrates the full AI workflow: policy and prompt-injection guardrails, retrieval-augmented generation, tool invocation, conversation memory, and a final grounded response.
The architecture deliberately mirrors commercial LLM application systems (edge gateway + orchestration service + data/observability plane) at reduced scale. This is an active learning project — every line of code is written by hand, with AI used only for code review and concept lectures.
Architecture
user ─▶ [C#/.NET API gateway] ─HTTP─▶ [Python LangChain/LangGraph orchestrator]
(validation, │ policy/injection check (guardrails)
telemetry, │ RAG retrieval (pgvector)
response shaping) │ tool invocation (bounded agent loop)
│ conversation memory (checkpointer)
▼
[Ollama: local LLM + embeddings] [pgvector/Postgres: vectors, telemetry, history]All services run as Docker containers on a private network, orchestrated via docker-compose.
Services
- dotnet_server (C# / ASP.NET Core) — Edge gateway: request validation, middleware-based telemetry, forwards to the orchestrator via
IHttpClientFactory - langchain_service (Python / Flask / LangChain / LangGraph) — Orchestrates the AI pipeline: guardrails → RAG → tools → memory → response
- ollama — Serves local LLMs and embedding models (qwen2.5, nomic-embed-text)
- pgvector-service (PostgreSQL + pgvector) — Vector store for RAG; relational store for telemetry and history
- model-provisioning — Container job ensuring required models are pulled before serving
Key Engineering Decisions
- Mock/live seam via a model factory + env config — enables full pipeline development on low-compute hardware with a one-flag switch to real inference; demonstrates test-double and dependency-inversion thinking
- LangGraph state-machine orchestration — the request pipeline is modeled as nodes, conditional edges, and shared state rather than a monolithic function, supporting branching (block on policy violation), a bounded tool loop, and checkpointer-based memory
- pgvector over a standalone vector DB — keeps vectors and relational telemetry in one engine, enabling joins and simpler operations
- Idempotent startup ingestion + healthcheck-gated dependencies — services wait for readiness, and re-running setup never duplicates data
Observability (the project's thesis)
Middleware- and node-level telemetry captures latency, tokens, model, and pipeline decisions, with correlation-ID propagation across services — designed toward OpenTelemetry and dashboards.
Tech Stack
Python, C#, ASP.NET Core, Flask, LangChain, LangGraph, Ollama, PostgreSQL, pgvector, Docker, docker-compose
Status
In active development. The multi-container environment, end-to-end local inference, and the mock/live model factory are in place; the LangGraph pipeline, RAG integration, conversation memory, and telemetry persistence are being wired now. An evaluation harness (golden dataset + LLM-as-judge) is planned.
