Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: introduce experimental JavaScript RegExp Engine #761

Merged
merged 7 commits into from
Aug 30, 2024
Merged

Conversation

antfu
Copy link
Member

@antfu antfu commented Aug 30, 2024

This PR introduces an experimental engine using JavaScript's native RegExp without the Oniguruma Wasm binary. This makes Shiki run entirely with pure JavaScript.

Approach

Currently, the wasm binary Oniguruma we are relying on is actually a RegExp engine written in C. It's very powerful and supports many extensive syntax and features that JavaScript does not support. As JavaScript evolves, now the modern JavaScript got many missing features like regex lookahead etc. So the idea came to whether we could leverage the RegExp engine shipped in the language instead of porting another.

Then I end up with https://github.com/antfu/oniguruma-to-js, a library to convert Oniguruma features down to the syntaxes that JavaScript RegExp could understand. Think of Babel that transpile ESNext to ES5.

It turns out that the feature parity isn't that far and mostly syntax difference. With that, we get the ~40% of Shiki languages work perfectly with the JS engine, most of the others are supported partially, while only 2 languages that will fail at this moment.

Compactiblity

Currently, the result is

Status Number
Total Languages 213
OK 84
Mismatch 127
Error 2

Full report: https://github.com/shikijs/shiki/blob/feat/engine-lite/scripts/report-engine-js-compat.md

Benchmark

With the early benchmarking, it indicates the JavaScript engine is actually 1.7x faster than WASM with shiki.codeToTokensBase()

Benchmark is run against the 84 fully supported languages by both engines.

Screenshot 2024-08-30 at 15 18 56

Usage

Currently the usage is like:

import { createHighlighter, createJavaScriptRegexEngine } from 'shiki'

const jsEngine = createJavaScriptRegexEngine()

const shiki = await createHighlighter({
  langs: [...],
  themes: [...],
  engine: jsEngine, // <---
})

shiki.codeToHtml(...)

In the future, we might need to do a completely redesign (break changes) to decouple the WASM onig engine so it can be bundled more composablely.

@antfu antfu merged commit 2be5b2d into main Aug 30, 2024
11 of 13 checks passed
@antfu antfu deleted the feat/engine-lite branch August 30, 2024 14:13
@paoloricciuti

This comment was marked as off-topic.

@shikijs shikijs deleted a comment from netlify bot Aug 30, 2024
@shikijs shikijs deleted a comment from codecov bot Aug 30, 2024
@shikijs shikijs deleted a comment from netlify bot Aug 30, 2024
@antfu
Copy link
Member Author

antfu commented Aug 30, 2024

Update: as in v1.15.1, 179 out of 213 languages (84%) are highlighting correctly with JS engine now!

Status Number
Total Languages 213
OK 179
Mismatch 32
Error 2

https://github.com/shikijs/shiki/blob/main/scripts/report-engine-js-compat.md

@paoloricciuti

This comment was marked as off-topic.

@orta
Copy link
Contributor

orta commented Sep 1, 2024

Wow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants