Hey, Tree-sitter author here. Thanks for posting! Let me know if you have questi...

akavel · on Feb 22, 2021

There's been some recent discussion as to whether tree-sitter grammars can be used to parse markdown with some hacks or not (currently it's being done by working around all the tree-sitter machinery, resulting in a lot of problems), with no consensus among plugin authors:

https://github.com/nvim-treesitter/nvim-treesitter/issues/87...

Could you possibly chime into that discussion and help them with any possible insights you might have on that? That would be really awesome! TIA <3

fiddlerwoaroof · on Feb 22, 2021

I’ve been using tree-sitter via FFI from Common Lisp, but what I’d really like would be a way to write my own code generator so that the generated parser could be “native” lisp code. Otherwise, it’s an amazing tool: my only other complaint would be the lack of a grammar for objective-c which would be useful for a lisp/objective-c bridge I’ve been working on.

maxbrunsfeld · on Feb 22, 2021

I think that it'd be pretty easy to generate parser code in other languages besides C, but it would be a lot of work to do to port the core library itself[1] to those other languages.

[1] https://github.com/tree-sitter/tree-sitter/tree/master/lib/s...

I agree about the Objective-C grammar! Although it looks like somebody's started work on it:

https://github.com/merico-dev/tree-sitter-objc

josephg · on Feb 22, 2021

There's an architecture for compilers that I've been wanting for years where a keystroke change to the sourcecode results in an incremental change to the AST, and then the compiler can consume that AST delta to generate a binary patch to the compiled executable.

Would tree-sitter be able to be used for that? (What I want is to feed tree-sitter a stream of keystroke changes and get out a stream of minimal AST changes as a result).

dcreager · on Feb 23, 2021

You don't get the AST _diff_ as the result (you get a new tree whose structure is shared with the old tree), but tree-sitter is specifically designed to support this kind of incremental edit use case: https://tree-sitter.github.io/tree-sitter/using-parsers#edit...

anaerobicover · on Feb 22, 2021

I've done two grammars for my own use in the last few months (well, one isn't quite complete yet) and it's been quite an enjoyable (learning) experience. Thanks for sharing this tool!

maxbrunsfeld · on Feb 22, 2021

That's great to hear. Thanks!

gravypod · on Feb 22, 2021

When I played around with tree sitter a bit I noticed there were situations where ast elements didn't exactly contain what I'd expect them to. For example: comments are represented in the AST but unfortunately they don't have the contents of the comment parsed out following the laguanges conventions.

I was wondering if this is a case I could open an issue about? Is this for the main tree sitter repo or should I open one language-by-language?

I was looking into automating some stuff across all languages with tree-sitter but handling all of the languages comments syntaxes made it very hard.

maxbrunsfeld · on Feb 22, 2021

Most tree-sitter grammars just parse comments as a single token. Can you give an example of what you mean when you say "contents of the comment parsed out"?

Are you talking about conventions like JSDoc, for putting structured data inside of comments? On GitHub, we handle that by parsing JSDoc comments in a separate pass, using a separate parser. We do it this way because JSDoc isn't really part of the JavaScript language, not all projects use JSDoc, and not all applications are interested in parsing the text inside of comments.

gugagore · on Feb 22, 2021

My guess is that they meant parsing code that has been "commented out".

rattray · on Feb 22, 2021

I interpreted it to mean, "Remove the *s from code like this:"

    /* This comment
     * Should just be alphanumeric.
     */

gravypod · on Feb 23, 2021

Yep, this is exactly what I meant. Turning

    /* Something */

or

    { Something }

into:

    " Something "

Or, even better, into:

    "Something"

yig · on Feb 22, 2021

Are there any plans to support modifying the grammar on the fly or without recompiling?

dcreager · on Feb 22, 2021

I don't think you can do this without recompiling, since the grammars get translated into C code before use. But the built-in command line tools (‘tree-sitter parse’, etc) all support a mode where they will detect local changes to a checked-out grammar definition, and recompile on the fly if needed. (This happens each time the CLI program is started up; it doesn't happen during a long-running process.)

sitkack · on Feb 22, 2021

The obvious answer is to embed TCC or another C compiler and either generate a dynamic library or generate wasm and load it directly into the process.

exec_wasm(generate_wasm(generate_c(grammar)))

Now if you can make that whole fn chain incremental, then a delta_grammar -> delta_c -> delta_wasm -> delta_recomputed_wasm_call stack, this will propagate deltas down to exec_wasm and you could dynamically execute the generated code as the grammar changes.

maxbrunsfeld · on Feb 22, 2021

One day, I would love to generalize the web-based playground so that you could edit the grammars. But it's complicated, because we use C as our output language, so you would always need to recompile the C after changing the grammar.

So, I would say that it's not on our near-term roadmap.

billconan · on March 8, 2021

I'm curious if tree-sitter can handle c++/c. I think it's supper difficult with meta programming. Without the preprocessor, I think it is not possible to parse c++ correctly.

dcreager · on March 9, 2021

We do have C and C++ grammars [1,2] but they need some love. You're right that these two languages are among the hardest to support. You could get a tree-sitter external scanner to mimic the preprocessor without too much difficulty, but you'd still run into the problem that your macro definitions might appear in another file. Parsing in general is much easier to implement and reason about if the parse result depends only on the content of the single file that you're looking at.

[1] https://github.com/tree-sitter/tree-sitter-c

[2] https://github.com/tree-sitter/tree-sitter-cpp

lemming · on Feb 22, 2021

Is it possible to use tree-sitter to generate parsers in languages other than C? How hard would it be to modify it to create parsers in e.g. Java?

Edit: sorry, I just saw that you had answered that below.

autoditype · on Feb 22, 2021

Thanks for building this. I had not heard of it before, but it looks great Are there more tutorials elsewhere on the Internet you would recommned, besides what is in the documentation?

maxbrunsfeld · on Feb 22, 2021

Not that I know of, right now :(.

In the near future, we'll create some more GitHub-specific documentation that walks you through how to add advanced language support for any programming language on GitHub, by writing a Tree-sitter grammar, and then by writing the tree queries that are used for syntax highlighting, simple code navigation, and someday soon... precise code navigation.