Staff AI Systems Engineer

I design and build
production AI systems.

End-to-end AI platforms with control loops across every layer.Built for reliability, observability, and real-world execution.

intentorchestrationexecutionverificationsupervision
Read the full control systems thesis →

Sr Software Architect @ VectorVest

Built AI-assisted engineering systems → ~3x throughput to production

Most AI systems fail in production.

Not because the model is wrong — but because the system has no control loop.

I build AI platforms that:

  • orchestrate end-to-end workflows (retrieval → inference → tools)
  • define a source of truth (evals, golden datasets)
  • trace execution across every layer (OpenTelemetry, replay)
  • enforce verification at execution time (tests, outputs, tool validation)
  • expose supervision interfaces for human control

This is LLMOps as a system, not just model integration.

Selected Systems

Production AI systems I've designed and built.

AI PR Generation System

Problem
Manual PR authoring was the primary bottleneck across the engineering org.
System
User story → code → tests/lint → review gate.
Control
Execution-time evals + feedback loop into prompts and system tuning.
Outcome
~3x increase in throughput to production.
LLMOpsRAGMCPEvals

Distributed AI Ingestion Pipeline

Problem
Large-document ingestion was unreliable — silent failures corrupted retrieval quality.
System
Queue-based chunking + distributed workers + DLQ.
Control
Data validation + retrieval evals.
Outcome
Stable ingestion for large-scale knowledge systems.
AI PlatformRAGDistributed SystemsEvals

Agent Execution + Supervision System

Problem
Agent workflows were opaque — no tracing, no intervention, no post-hoc debugging.
System
Real-time supervision interface for AI agent execution (OpenCode).
Control
Execution tracing + human-in-the-loop checkpoints.
Outcome
Debuggable, controllable agent workflows.
AI Agent SupervisionObservabilityTypeScript
View all systems →

AI systems in production behave like control systems.

intent → orchestration → execution → verification → supervision

Failures come from:

  • misaligned intent (did we deliver value?)
  • weak retrieval / context quality
  • lack of observability across system layers
  • missing or delayed feedback loops

My focus is making these systems reliable, measurable, and controllable.

Focus Areas

AI Platform Engineering / LLMOpsRetrieval systems + evaluation (RAG, grounding)Agent orchestration + workflowsObservability + tracing (OpenTelemetry)Execution-time verification + eval systems

Writing

I write about building reliable AI systems in production.

Read more →

If you're building AI platforms, agent systems, or production LLM features where correctness, observability, and control matter —

let's talk.