When I played around with tree sitter a bit I noticed there were situations where ast elements didn't exactly contain what I'd expect them to. For example: comments are represented in the AST but unfortunately they don't have the contents of the comment parsed out following the laguanges conventions.
I was wondering if this is a case I could open an issue about? Is this for the main tree sitter repo or should I open one language-by-language?
I was looking into automating some stuff across all languages with tree-sitter but handling all of the languages comments syntaxes made it very hard.
Most tree-sitter grammars just parse comments as a single token. Can you give an example of what you mean when you say "contents of the comment parsed out"?
Are you talking about conventions like JSDoc, for putting structured data inside of comments? On GitHub, we handle that by parsing JSDoc comments in a separate pass, using a separate parser. We do it this way because JSDoc isn't really part of the JavaScript language, not all projects use JSDoc, and not all applications are interested in parsing the text inside of comments.
I was wondering if this is a case I could open an issue about? Is this for the main tree sitter repo or should I open one language-by-language?
I was looking into automating some stuff across all languages with tree-sitter but handling all of the languages comments syntaxes made it very hard.