Speed up tokenization

The tokenization performed by `tree-sitter` can be slow for large datasets: up to 50% or even more is spent on this part.

Multiple improvements are possible:
- Try out similar parsers, possibly those that can be run in browser contexts as well, possibilities:
  - https://github.com/microsoft/vscode-textmate - code highlighter used by vscode, requires WASM (but no native deps)
  - https://github.com/highlightjs/highlight.js - a full JavaScript highlighter
  - https://github.com/PrismJS/prism or https://github.com/tannerlinsley/reprism - pure JS as well with support for a lot of languages, but no longer maintained
  - https://github.com/syntax-tree/unist - gestandardiseerd protocol voor syntax trees?
- Parallelize tokenization using workers
- Benchmark/profile tree-sitter, we might just be using it wrong?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Speed up tokenization #1601

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Speed up tokenization #1601

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions