Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: Cordon – Reduce large log files to anomalous sections (github.com/calebevans)
22 points by calebevans 13 days ago | hide | past | favorite | 2 comments
Cordon uses transformer embeddings and density scoring to identify what's semantically unique in log files, filtering out repetitive noise.

The core insight: a critical error repeated 1000x is "normal" (semantically dense). A strange one-off event is anomalous (semantically isolated).

Outputs XML-tagged blocks with anomaly scores. Designed to reduce large logs as a form of pre-processing for LLM analysis.

Architecture: https://github.com/calebevans/cordon/blob/main/docs/architec...

Benchmark: https://github.com/calebevans/cordon/blob/main/benchmark/res...

Trade-offs: intentionally ignores repetitive patterns, uses percentile-based thresholds (relative, not absolute).





Also, please feel free to try to online demo: https://huggingface.co/spaces/calebdevans/cordon

Just a quick update: I’ve add support for remote embedding models in the most recent release (v0.3.0)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: