One of the main benefits of Semgrep is its unified DSL that works across all sup...

codelion · 2025-03-01T01:56:20 1740794180

That's a really interesting breakdown of the DSL vs. S-expression approach. I can see your point about the potential fragility of relying directly on tree-sitter outputs, especially with grammar drift. It took me a while to wrap my head around the S-expression syntax when I first started using tree-sitter, so I appreciate the comparison to a more human-readable DSL like Semgrep's.

The other benefit of a DSL like Semgrep's is that LLMs have become very good at generating it. See https://github.com/lambdasec/autogrep on how to automatically generate Semgrep rules from existing CVEs.

sanketsaurav · 2025-03-01T01:29:27 1740792567

> One of the main benefits of Semgrep is its unified DSL that works across all supported languages.

> People can disagree, but I'm not sure that tree-sitter S-expressions as an upgrade over a DSL.

100% agree — a DSL is a better user experience for sure. But this is a deliberate choice we made of not inventing a new DSL and using tree-sitter natively. We've directly addressed this and agree that the S-expressions are gnarly; but we're optimizing for a scenario that you wouldn't need to write this by hand anyway.

It's a trade-off. We don't want to spend time inventing a DSL and port every language's idiosyncrasies to that DSL — we'd rather improve our runtime and add support for things that other tools don't support, or support only on a paid tier (like cross-file analysis — which you can do on Globstar today).

micksmix · 2025-03-01T04:32:31 1740803551

That makes a lot of sense. I wish you the best of luck and will be happy to try it out as you continue to develop it!