Members-Only
Recent Talks & Demos are for members only
You must be an AI Tinkerers active member to view these talks and demos.
Self-Healing Infrastructure with AI Agents
This talk demonstrates an autonomous agent that monitors environments, uses AI for real-time root-cause analysis of issues, and automatically applies fixes, learning from outcomes to improve accuracy.
The env-healing-agents project is an autonomous self-healing agent that monitors any running environment, detects problems in real time, diagnoses their root cause using AI, and automatically applies fixes.
Here’s what it does end-to-end:
- Monitor;
It tails log streams from any source; Kubernetes pods, files, AWS CloudWatch, systemd journald, stdin pipes, or process stdout and matches every line against known issue patterns .
- Diagnose;
When a pattern matches, it extracts the relevant error windows (±10 lines of context around each error line) and sends them to an AI model for root-cause analysis. Models in use:
- Claude via Anthropic Vertex AI
- Gemini
The AI returns a structured diagnosis: root cause, severity, confidence score, and the recommended fix to apply.
- Remediate;
If the confidence score meets the threshold (default 0.7) and remediation is enabled, it executes the fix from fix strategies.
- Learn;
Every outcome is recorded. After each run the learning agent adjusts the confidence scores for each issue pattern based on whether fixes succeeded or failed, so the agent gets more accurate over time.
Kubernetes-deployed AI agents diagnose and remediate environments using LLMs.