Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey, Tree-sitter author here. Thanks for posting! Let me know if you have questions about the project.


There's been some recent discussion as to whether tree-sitter grammars can be used to parse markdown with some hacks or not (currently it's being done by working around all the tree-sitter machinery, resulting in a lot of problems), with no consensus among plugin authors:

https://github.com/nvim-treesitter/nvim-treesitter/issues/87...

Could you possibly chime into that discussion and help them with any possible insights you might have on that? That would be really awesome! TIA <3


I’ve been using tree-sitter via FFI from Common Lisp, but what I’d really like would be a way to write my own code generator so that the generated parser could be “native” lisp code. Otherwise, it’s an amazing tool: my only other complaint would be the lack of a grammar for objective-c which would be useful for a lisp/objective-c bridge I’ve been working on.


I think that it'd be pretty easy to generate parser code in other languages besides C, but it would be a lot of work to do to port the core library itself[1] to those other languages.

[1] https://github.com/tree-sitter/tree-sitter/tree/master/lib/s...

I agree about the Objective-C grammar! Although it looks like somebody's started work on it:

https://github.com/merico-dev/tree-sitter-objc


There's an architecture for compilers that I've been wanting for years where a keystroke change to the sourcecode results in an incremental change to the AST, and then the compiler can consume that AST delta to generate a binary patch to the compiled executable.

Would tree-sitter be able to be used for that? (What I want is to feed tree-sitter a stream of keystroke changes and get out a stream of minimal AST changes as a result).


You don't get the AST _diff_ as the result (you get a new tree whose structure is shared with the old tree), but tree-sitter is specifically designed to support this kind of incremental edit use case: https://tree-sitter.github.io/tree-sitter/using-parsers#edit...


I've done two grammars for my own use in the last few months (well, one isn't quite complete yet) and it's been quite an enjoyable (learning) experience. Thanks for sharing this tool!


That's great to hear. Thanks!


When I played around with tree sitter a bit I noticed there were situations where ast elements didn't exactly contain what I'd expect them to. For example: comments are represented in the AST but unfortunately they don't have the contents of the comment parsed out following the laguanges conventions.

I was wondering if this is a case I could open an issue about? Is this for the main tree sitter repo or should I open one language-by-language?

I was looking into automating some stuff across all languages with tree-sitter but handling all of the languages comments syntaxes made it very hard.


Most tree-sitter grammars just parse comments as a single token. Can you give an example of what you mean when you say "contents of the comment parsed out"?

Are you talking about conventions like JSDoc, for putting structured data inside of comments? On GitHub, we handle that by parsing JSDoc comments in a separate pass, using a separate parser. We do it this way because JSDoc isn't really part of the JavaScript language, not all projects use JSDoc, and not all applications are interested in parsing the text inside of comments.


My guess is that they meant parsing code that has been "commented out".


I interpreted it to mean, "Remove the *s from code like this:"

    /* This comment
     * Should just be alphanumeric.
     */


Yep, this is exactly what I meant. Turning

    /* Something */ 
or

    { Something }
into:

    " Something "
Or, even better, into:

    "Something"


Are there any plans to support modifying the grammar on the fly or without recompiling?


I don't think you can do this without recompiling, since the grammars get translated into C code before use. But the built-in command line tools (‘tree-sitter parse’, etc) all support a mode where they will detect local changes to a checked-out grammar definition, and recompile on the fly if needed. (This happens each time the CLI program is started up; it doesn't happen during a long-running process.)


The obvious answer is to embed TCC or another C compiler and either generate a dynamic library or generate wasm and load it directly into the process.

exec_wasm(generate_wasm(generate_c(grammar)))

Now if you can make that whole fn chain incremental, then a delta_grammar -> delta_c -> delta_wasm -> delta_recomputed_wasm_call stack, this will propagate deltas down to exec_wasm and you could dynamically execute the generated code as the grammar changes.


One day, I would love to generalize the web-based playground so that you could edit the grammars. But it's complicated, because we use C as our output language, so you would always need to recompile the C after changing the grammar.

So, I would say that it's not on our near-term roadmap.


I'm curious if tree-sitter can handle c++/c. I think it's supper difficult with meta programming. Without the preprocessor, I think it is not possible to parse c++ correctly.


We do have C and C++ grammars [1,2] but they need some love. You're right that these two languages are among the hardest to support. You could get a tree-sitter external scanner to mimic the preprocessor without too much difficulty, but you'd still run into the problem that your macro definitions might appear in another file. Parsing in general is much easier to implement and reason about if the parse result depends only on the content of the single file that you're looking at.

[1] https://github.com/tree-sitter/tree-sitter-c

[2] https://github.com/tree-sitter/tree-sitter-cpp


Is it possible to use tree-sitter to generate parsers in languages other than C? How hard would it be to modify it to create parsers in e.g. Java?

Edit: sorry, I just saw that you had answered that below.


Thanks for building this. I had not heard of it before, but it looks great Are there more tutorials elsewhere on the Internet you would recommned, besides what is in the documentation?


Not that I know of, right now :(.

In the near future, we'll create some more GitHub-specific documentation that walks you through how to add advanced language support for any programming language on GitHub, by writing a Tree-sitter grammar, and then by writing the tree queries that are used for syntax highlighting, simple code navigation, and someday soon... precise code navigation.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: