Show HN: Incorporating AI in engineering on-call workflows

6 points by lumax15 9 months ago

Hello HN! My name is Max, and I’m a co-founder at Lynx (https://uselynx.ai). We’re building an AI-powered incident resolution platform to help engineers debug and resolve on-call issues faster. If you’ve ever been paged in the middle of the night and had to spend hours piecing together logs, metrics, and code, we’d love your feedback.

* The Problem *

On-call hasn’t kept pace with modern engineering. Even with great observability tools, diagnosing incidents is slow because:

- Systems are increasingly complex.

- Logs, dashboards, and documentation are scattered.

- Context often depends on tribal knowledge.

Lynx is designed to reduce the time and manual effort involved in incident resolution.

* How It Works *

Lynx runs on your servers and integrates seamlessly with your existing stack—including your codebase, infrastructure, logs, metrics, tracing, CI/CD, and cloud services.

- Direct Integration: Lynx connects via a local agent installed on your servers. This agent interfaces directly with your systems, enabling real-time command execution, log retrieval, status checks, and overall infrastructure interaction.

- Automatic Context Aggregation: Using our proprietary chain-of-thought execution process, Lynx automatically gathers and synthesizes context from across your stack—from logs and metrics to code insights and infrastructure status.

- Command Generation and Execution: With the aggregated context, Lynx generates targeted commands and executes them to resolve issues, streamlining debugging and remediation.

Our method delivers direct, actionable insights and interventions that accelerate incident resolution and overcomes many limitations of traditional diagnostic tools.

Check out our demo: https://youtu.be/atzdMyd7PG0?si=Ntvh6uhE3bwE5z7F

* Safety and Security *

We built Lynx with security in mind to ensure it only takes safe, controlled actions:

- Manual Command Approvals: You can require manual approval before any command is executed.

- Role-Based Access Control (RBAC): Lynx supports RBAC for many integrations. You can grant specific permissions for investigative tasks while restricting access to sensitive operations or data.

- Optional On-Prem Hosting: Lynx can be deployed on-prem or in your private cloud, keeping all data and operations within your network.

* Looking for Feedback *

If your team deals with heavy on-call loads, we’d like to hear how you’re managing debugging today and any feedback on our approach.

- Try it out: https://www.uselynx.ai/getstarted

- Discuss: Reach out to founders@uselynx.ai

Your insights will help shape Lynx into a tool that truly addresses on-call pain points.

gianthinter909 9 months ago

My tech stack is pretty fragmented, and a lot of tools don't talk to each other. What integrations do you support?

lumax15 9 months ago

Lynx can actually automatically integrate with many dev tools!
You can directly connect Lynx to your environment by installing a lightweight agent on your servers (devbox or dedicated server). This agent leverages unix commands to interface with your systems—including your codebase, logs, metrics, tracing, CI/CD, and cloud services.
The agent uses “chain-of-thought-execution” to automatically explore resources, tooling, configurations, and other context within your environment, so you don’t need to explicitly set up any integrations. We do find that setting specific instructions (e.g. /path/to/git_repo, where credentials are stored, etc) helps Lynx be more efficient, and we set up an easy UI to do this.
Some integrations like Pagerduty, Slack, Jira still need to be manually set up though. Full deployment instructions are available at https://docs.uselynx.ai.

781830242 9 months ago

Powerful but also scary. What if it accidentally nukes my entire cluster?

lumax15 9 months ago

We built out a few mechanisms to make sure Lynx only automatically perform safe actions, and you have complete control over it’s access and behavior:
1. Users can set whether Lynx commands run automatically or manually. In manual mode, users can approve, edit, or reject commands before they execute.
2. When manual mode is off, the model was trained to only automatically execute read-only commands. Although it’s theoretically possible to still execute unsafe commands, we audit this internally and have a 0% error rate.
3. Our system supports fine-grained RBAC for various integrations (such as DataDog, Kubernetes, cloud providers), allowing you to grant specific permissions for investigation tasks while restricting sensitive operations.