Tree-sitter: parser generator tool and incremental parsing library.

lysdexic@programming.dev · 7 months ago

Tree-sitter: parser generator tool and incremental parsing library.

Uli · 7 months ago

This is super cool. Watched the talks from Max Brunsfeld, surprised this has been around since 2018 and I haven’t heard of it.

I actually tried some complex parsing myself lately. I had a bunch of YAML I needed to maintain for various deployments in a CI/CD system. I really wanted to have one YAML template to generate the files, plus a file for each project with unique elements to be injected into that project’s generated YAML.

Probably was more of an indication that we needed to clean up the overrides we were putting on top of our Helm charts, but I wanted a way to generate our lengthy override files without having to manually keep track of where the differences were between projects. And maybe even stage changes to deployment files for when new product versions are released.

This is exciting. I’m going to look into Tree Sitter more and maybe try to contact the dev. It seems like it does everything I’m looking for, just for an entirely different use case.

Deebster@programming.dev · 7 months ago

I know of it because Helix uses it, and it works really well.

Lupec@lemm.ee · 7 months ago

Yup, I first heard of it in neovim but the way helix integrates it as a first class citizen is so damn cool

refalo@programming.dev · 7 months ago

You might also be interested in https://github.com/alexpovel/srgn, you can use it to easily do things like context-sensitive search/replace and a lot more.

Uli · 7 months ago

Read through the Readme and it’s definitely a good tool to know about. It doesn’t fit the needs of my current problem, but I’m certain I’ll use it in the future for context sensitive searching, since grep/awk/sed/tr have definitely fallen flat for me in the past. I might also be able to study how they utilized tree-sitter CLI when I explore my own implementation.

For my purposes, I want to take a group of similar-yet-different YAML file sets (though file type should be arbitrary), and feed them through a tool that will spit out a YAML template containing everything that is shared between multiple sets.

Then, I want it to create a file for each YAML which defines which parts to pull from the template file and a list of variables to be inserted into holes in the templates. Basically creating a madlib that can recreate any file in the original group given the right list of variables to insert.

For example, if I have a hundred YAML files that are mostly similar but contain different project names, have different server types provisioned, and are pulling different product versions, I would want this script to parse all hundred files and spit out a template that could be used as the basis to build any of the hundred files. The template would be combined with a hundred variable trees that would insert each unique part of each file into the right place.

In effect, I could have a small variables file that gives only the unique portions of the equivalent YAML - in this case, it would contain only the project name, the server type, the product version. Then, these small files could be combined with the universal template to recreate the original hundred YAML files. But unlike using a simple override mechanism, I would be able to change elements of the template YAML including broad structural changes, and after some processing, the change would affect all one hundred output YAMLs.

One could track things like environment variables that are specific to a certain project version and require that whenever a project version has a particular value to insert a particular environment variable into the output YAML. Or a centralized file could be made specifying which product versions correspond to which projects, allowing the engineer to change all product versions for a given set of projects in one go. Or one could create a universal template of IaC code that’s applicable to a broad swath of use cases and quickly build out a full set of YAML manifests and Terraform files using a small file that specifies what components will be needed and where to authenticate to the server.

I’m not aware of any tool that does this, but I think tree-sitter gets me much of the way there. If I can use it to parse any given file into a context aware tree, I would then need to make a script that combines the shared features of many context trees and splits the unique features out into small variable files. Then a script to merge them back together as needed. And something to manage file system structure, such as whether to parse every file individually or to strategically merge some sets so you have one variable file that produces multiple output YAML.

Sorry I’m brainstorming at you, just trying to figure out if the tool I’m envisioning is even feasible. Seems like it is, but I’ll have to figure out how to use tree-sitter CLI before I begin.

Tree-sitter: parser generator tool and incremental parsing library.

Tree-sitter: parser generator tool and incremental parsing library.

Tree-sitter｜Introduction