The tokenization performed by tree-sitter can be slow for large datasets: up to 50% or even more is spent on this part.
Multiple improvements are possible:
- Try out similar parsers, possibly those that can be run in browser contexts as well, possibilities:
- Parallelize tokenization using workers
- Benchmark/profile tree-sitter, we might just be using it wrong?
The tokenization performed by
tree-sittercan be slow for large datasets: up to 50% or even more is spent on this part.Multiple improvements are possible: