HN Theater @HNTheaterMonth

The best talks and videos of Hacker News.

Hacker News Comments on
"Tree-sitter - a new parsing system for programming tools" by Max Brunsfeld

Strange Loop Conference · Youtube · 131 HN points · 4 HN comments
HN Theater has aggregated all Hacker News stories and comments that mention Strange Loop Conference's video ""Tree-sitter - a new parsing system for programming tools" by Max Brunsfeld".
Youtube Summary
Developer tools that support multiple programming languages generally have very limited, regex-based code-analysis capabilities. Tree-sitter is a new parsing system that aims to change this paradigm. It provides a uniform C API for parsing an ever-growing set of languages. It features high-performance incremental parsing and robust error recovery, which allow it to be used to parse code in real-time in a text editor. There are bindings for using Tree-sitter from Node.js, Haskell, Ruby and Rust.

We're in the process of integrating Tree-sitter into both GitHub.com and the Atom text editor, which will allow us to analyze code accurately and efficiently, paving the way for better syntax highlighting, code navigation, and refactoring. We'll demo some new features that Tree-sitter has enabled in GitHub.com and Atom, discuss some the interesting algorithms that it uses, and share thoughts on some potential future applications.

Speaker: Max Brunsfeld
HN Theater Rankings

Hacker News Stories and Comments

All the comments and stories posted to Hacker News that reference this video.
You can watch a good Strangeloop presentation on Tree Sitter. https://www.youtube.com/watch?v=Jes3bD6P0To
Parsing (use of rather than theory) matters as it affects my work. So I followed up.

See https://youtu.be/Jes3bD6P0To

Tree sitter is based on LR parsing (see 23:30 in above video) extended to GLR parsing (see 38:30).

I've had enough of fools on HN posting unverified crap to make themselves feel cool and knowledgeable (and don't kid yourself that you helped me find the right answer by posting the wrong one). Time I withdrew. Goodbye HN.

IshKebab
I'm not sure what you think you're refuting but Tree Sitter definitely does some different stuff to allow recoverable parsing.
Oct 29, 2020 · rtsao on Introducing Semgrep and r2c
It's great to see more tools adopting tree-sitter [1].

Having a (fast) single tool that can accurately parse most commonly used programming languages is incredibly useful, but it requires the maintenance of dozens of grammars, which is difficult without a large community effort. Hopefully increased adoption means more accurate parsers and support for even more languages.

Tree-sitter powers syntax highlighting on GitHub.com and (soon) neovim and OniVim 2. Hopefully regex-based syntax highlighting is a thing of the past soon. If you haven't seen the Strange Loop conference talk on tree-sitter [2] yet, it's worth a watch.

I think a Prettier-like code formatter using tree-sitter would be cool, both in terms of potentially broader language support and native performance.

[1]: https://tree-sitter.github.io/tree-sitter/

[2]: https://www.youtube.com/watch?v=Jes3bD6P0To

Dec 01, 2019 · 131 points, 28 comments · submitted by ggurgone
minxomat
Important recent development in tree sitter was the new query language. Like TextMate or Sublime Grammars, ts in atom did use CSS selectors, but now it has a much more powerful s-expression query language which is useful for more than just syntax highlighting, e.g. static analysis. An application of that is Github's semantic, a haskell tool for code navigation and call graph analysis.

Demo and explanation: https://github.com/tree-sitter/tree-sitter/pull/444

adadgar
Neovim is aiming to integrate this in the next major release, v0.5: https://github.com/neovim/neovim/pull/11113
lewisl9029
I've been following tree sitter for a while, as I find the tech super cool and can't wait to see more practical applications.

One thing (among many others) that I've found really promising about Dark is its editor. See the hands-on video on their homepage for a demo: https://darklang.com/

It mostly feels like you're just typing text like in any regular text editor, but your inputs are actually manipulating the AST directly, and the editor itself ensures that your inputs can never result in an invalid program (i.e. there's no such thing as making a syntax error in Dark). It's inspired by tooling in the lisp world like Paredit and Parinfer, but Dark itself doesn't have to _look_ like a lisp because the structure of the AST is maintained by the editor itself instead of by users manually inserting and removing parens. It's an ingenious way to get most of the productivity benefits of a lisp-style syntax and all the structural editing tooling that comes with it, without intimidating new-comers with the super foreign looking parens infested syntax lisps are infamous for.

The other day I was actually briefly looking into whether or not it could be possible to replicate something like this in Atom using tree-sitter for some mainstream language like JS, but ended up getting blocked by the fact that Atom doesn't seem to offer an API for plugins to block/replace user input. This is probably for the best, given all the horrible ways this could be abused, but it does mean if I wanted to explore the idea further I'd probably have to either fork Atom to experiment with the idea or build something up from scratch, which is a pretty daunting undertaking given how deceptively complex modern editors can get these days.

But maybe I'm missing a different way to accomplish this in Atom with its existing APIs? Or does anyone know if VSCode's extension APIs can support this use case? I realize I've probably barely scratched the surface given how little time I've spent on it so far.

leeoniya
> The other day I was actually briefly looking into whether or not it could be possible to replicate something like this in Atom using tree-sitter for some mainstream language like JS,

already being done as part of CodeMirror v6:

https://marijnhaverbeke.nl/blog/lezer.html

https://github.com/lezer-parser/

minxomat
I really don't think it's inspired by Parinfer. It's likely based on the theory of structural editing and AST projections first popularized by JetBrains' CEO and available for experimentation in the open source project MPS. An end to end application of this theory is commonly referred to as a language workbench.

Papers: https://confluence.jetbrains.com/display/MPS/MPS+publication...

Language workbenches: https://www.martinfowler.com/articles/languageWorkbench.html

Nice intro to structural editing:https://medium.com/@mikhail.barash.mikbar/looking-at-code-th... (also mentions scratch)

carapace
> It mostly feels like you're just typing text like in any regular text editor, but your inputs are actually manipulating the AST directly, and the editor itself ensures that your inputs can never result in an invalid program (i.e. there's no such thing as making a syntax error in Dark).

The basic idea has been around for a while.

Here's something from the 80's: Alice Pascal https://www.templetons.com/brad/alice.html

> One of the first projects I did after forming Looking Glass Software Limited was a syntax-directed programming environment called Alice: The Personal Pascal.

> Syntax-directed editors are somewhat controversial, however I think they are quite good for people learning programming, and Alice was written first to be used in education in the school systems of Ontario. Our first sale was a contract to develop it for the Ministry of Education there.

dmortin
Will tree sitter also stimulate creation of free tools which work on the AST?

E.g. it's a mystery to me why we don't have free refactoring tools like the ones in IntelliJ. Like some free library which could extract methods, rename variables, etc. by modyfing the AST. It does not seem too hard.

Is it because the current AST parsers are not fast enough or is there some other reason?

adamsmith
You need semantic understanding to do several of those operations. Parsing often isn’t sufficient.
dmortin
Yes, but semantic understanding is not really complicated for rename variable, for example, so it's strange there is no library which can do that.
lioeters
From my limited knowledge/experience, the use of language server protocol (like in VS Code editor) enables refactoring operations like you describe, for example, in TypeScript it can create a struct out of function parameters, or create a class from old function-prototype based definitions. Compared to IDEs like IntelliJ, though, I imagine the feature set is much, much smaller in scope.

I did see some discussion about integrating tree-sitter with VS Code, but the focus seems limited to syntax highlighting, not operating on ASTs.

lioeters
I found that the last time this talk was posted on HN [0], the author of tree-sitter mentioned that a couple of language servers are indeed using tree-sitter.

* Bash - https://github.com/mads-hartmann/bash-language-server

* Ruby - https://github.com/rubyide/vscode-ruby/tree/master/server

[0] https://news.ycombinator.com/item?id=18213022

dmitriid
So... You write your grammars in Javascript. Which is then serialized to JSON but a parser defined in Rust, so that it can be compiled to C?..

That’s... a very roundabout way of doing things.

http://tree-sitter.github.io/tree-sitter/implementation

xvilka
I asked[1] recently if it's possible to remove the need of the whole NodeJS. The conclusion is that it might be possible to use duktape instead.

[1] https://github.com/tree-sitter/tree-sitter/issues/465

maxbrunsfeld
Many parser generation tools use their own custom grammar language, and then generate a C parser based on that. With Tree-sitter, it’s a similar setup, except the grammars are written in JavaScript instead of some custom language.

The parser generator itself is all written in Rust, but the end user doesn’t need to use rust in any way.

rrampage
The project page is at https://tree-sitter.github.io/tree-sitter/
dang
Discussed at the time: https://news.ycombinator.com/item?id=18213022
based2
(2018)
ggurgone
(it is the title of the talk)
saagarjha
Dates are usually added to posts that aren't recent.
georgewfraser
The most obvious application of tree-sitter is editors. I wrote a VSCode extension to replace the built-in syntax coloring with tree-sitter-based coloring: https://marketplace.visualstudio.com/items?itemName=georgewf...

I actually think it would make more sense for the various VSCode language extensions to just bake in tree-sitter for their language. I have had a PR open to do this with golang for a while: https://github.com/microsoft/vscode-go/pull/2555

ahelwer
Can you use tree-sitter for things that are more complicated than syntax highlighting, such as reference finding? I've been wanting to write a language server for a while but have been put off by the complexity of gracefully handling sections with incorrect syntax (while the user is typing, for example).
dmortin
What is the point of replacing the builtin syntax coloring? Is it faster or does it color more things?
Mathnerd314
Depending on the grammar I think it's a little slower than the regex-based TextMate coloring. But the overhead is mostly due to the VSCode plugin architecture.
georgewfraser
It colors more accurately.
dunkelheit
Builtin syntax highlighting for e.g. rust is laughably bad - the treesitter highlighting is much better. Side note: I've recently switched to vscode as my main editor and so far the experience has been full of contrasts - many advanced features such as remote editing are the real gamechangers and work flawlessly, but some basic features (the aforementioned highlighting, folding, basic git integration) are notably lacking in polish. You kind of expect that if they've gotten advanced stuff right then basic stuff is surely in order, but that is not the case.
AnthonBerg
Have you tried Jetbrains IntelliJ? In my experience the IntelliJ platform is, well, if you look in the direction VS Code is pointing, there you'll find IntelliJ?

Tangentially related, there's some tree-sitter activity in the Jetbrains org on Github: https://github.com/JetBrains?utf8=&q=tree-sitter&type=&langu...

which is cool

dunkelheit
I've used intellij a little bit and it is awesome (albeit a bit slow for my taste). The reason I stick to vscode is remote editing - compiling rust code locally on my laptop is a torture compared to compiling it on a beefy remote box! Remote editing in vscode is very well done, even most extensions work flawlessly without any changes. As I understand, there is nothing comparable for intellij.
AnthonBerg
Interesting!, thanks!
For a really nice solution to the error message problem, see this recent strangeloop talk: https://www.youtube.com/watch?v=Jes3bD6P0To

Basically it uses the parse tree disambiguation from the GLR parser to look for the most likely mistake the user made - it's very clever.

HN Theater is an independent project and is not operated by Y Combinator or any of the video hosting platforms linked to on this site.
~ yaj@
;laksdfhjdhksalkfj more things
yahnd.com ~ Privacy Policy ~
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.