Skip to content

Commit

Permalink
Readme, test files
Browse files Browse the repository at this point in the history
  • Loading branch information
slevithan committed Oct 30, 2024
1 parent 9a2bedd commit 28c14af
Show file tree
Hide file tree
Showing 14 changed files with 262 additions and 13 deletions.
119 changes: 113 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,115 @@ Oniguruma-To-ES deeply understands all of the hundreds of large and small differ

## Contents

- [Install and use](#install-and-use)
- [API](#api)
- [Options](#options)
- [Unicode, mixed case-sensitivity](#unicode-mixed-case-sensitivity)
- [Unicode / mixed case-sensitivity](#unicode--mixed-case-sensitivity)

## Install and use

```sh
npm install oniguruma-to-es
```

```js
import {compile, toRegExp} from 'regex';
```

In browsers:

```html
<script type="module">
import {compile, toRegExp} from 'https://esm.run/oniguruma-to-es';
//
</script>
```

<details>
<summary>Using a global name (no import)</summary>

```html
<script src="https://cdn.jsdelivr.net/npm/oniguruma-to-es/dist/index.min.js"></script>
<script>
const {compile, toRegExp} = OnigurumaToES;
</script>
```
</details>

## API

### `compile`

Transpiles an Oniguruma regex pattern and flags to native JavaScript.

```ts
function compile(
pattern: string,
flags?: OnigurumaFlags,
options?: CompileOptions
): {
pattern: string;
flags: string;
};
```

The returned `pattern` and `flags` can be provided directly to the `RegExp` constructor.

`OnigurumaFlags` are `i`, `m`, and `x` in any order (all optional). Oniguruma's flag `m` is equivalent to JavaScript's flag `s`.

#### Type `CompileOptions`

```ts
type CompileOptions = {
allowBestEffort?: boolean;
maxRecursionDepth?: number | null;
optimize?: boolean;
target?: 'ES2018' | 'ES2024' | 'ESNext';
};
```

See [Options](#options) for more details.

### `toRegExp`

Transpiles an Oniguruma regex pattern and flags and returns a native JavaScript `RegExp`.

```ts
function toRegExp(
pattern: string,
flags?: string,
options?: CompileOptions
): RegExp;
```

Flags are any combination of Oniguruma flags `i`, `m`, and `x`, and JavaScript flags `d` and `g`. Oniguruma's flag `m` is equivalent to JavaScript's flag `s`.

> [!TIP]
> Try it in the [demo REPL](https://slevithan.github.io/oniguruma-to-es/demo/).
### `toOnigurumaAst`

Generates an Oniguruma AST from an Oniguruma pattern and flags.

```ts
function toOnigurumaAst(
pattern: string,
flags?: OnigurumaFlags
): OnigurumaAst;
```

### `toRegexAst`

Generates a [`regex`](https://github.com/slevithan/regex) AST from an Oniguruma pattern and flags.

```ts
function toRegexAst(
pattern: string,
flags?: OnigurumaFlags
): RegexAst;
```

`regex` syntax and behavior is a strict superset of native JavaScript `RegExp`, so the AST is very close to representing native ESNext JavaScript but with some added features (atomic groups, possessive quantifiers, recursion). The `regex` AST doesn't use some `regex` features like flag `x` or subroutines because they follow PCRE behavior and work somewhat differently than in Oniguruma. The AST represents what's needed to precisely reproduce the Oniguruma behavior.

## Options

Expand Down Expand Up @@ -49,7 +156,7 @@ If `null`, any use of recursion throws. If an integer between `2` and `100` (and
<details>
<summary>More details</summary>

Using a higher limit is not a problem if needed. Although there can be a performance cost (generally small unless exacerbating an existing problem with superlinear backtracking), there is no effect on regexes that don't use recursion.
Using a high limit is not a problem if needed. Although there can be a performance cost (minor unless it's exacerbating an existing issue with runaway backtracking), there is no effect on regexes that don't use recursion.
</details>

### `optimize`
Expand All @@ -68,7 +175,7 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
<summary>More details</summary>

- `ES2018`: Uses JS flag `u`.
- Emulation restrictions: Character class intersection, nested negated classes, and Unicode properties added after ES2018 are not allowed.
- Emulation restrictions: Character class intersection, nested negated character classes, and Unicode properties added after ES2018 are not allowed.
- Generated regexes might use ES2018 features that require Node.js 10 or a browser version released during 2018 to 2023 (in Safari's case). Minimum requirement for any regex is Node.js 6 or a 2016-era browser.
- `ES2024`: Uses JS flag `v`.
- No emulation restrictions.
Expand All @@ -78,20 +185,20 @@ Sets the JavaScript language version for generated patterns and flags. Later tar
- Generated regexes might use features that require Node.js 23 or a 2024-era browser (except Safari, which lacks support).
</details>

## Unicode, mixed case-sensitivity
## Unicode / mixed case-sensitivity

Oniguruma-To-ES fully supports mixed case-sensitivity (and handles the Unicode edge cases) regardless of JavaScript [target](#target). It also restricts Unicode properties to those supported by Oniguruma and the target JavaScript version.

Oniguruma-To-ES focuses on being lightweight to make it better for use in browsers. This is partly achieved by not including heavyweight Unicode character data, which imposes a couple of minor/rare restrictions:

- Character class intersection and nested negated classes are unsupported with target `ES2018`. Use target `ES2024` or later if you need support for these Oniguruma features.
- Character class intersection and nested negated character classes are unsupported with target `ES2018`. Use target `ES2024` or later if you need support for these Oniguruma features.
- A handful of Unicode properties that target a specific character case (ex: `\p{Lower}`) can't be used case-insensitively in patterns that contain other characters with a specific case that are used case-sensitively.
- In other words, almost every usage is fine, inluding `A\p{Lower}`, `(?i:A\p{Lower})`, `(?i:A)\p{Lower}`, `(?i:A(?-i:\p{Lower}))`, and `\w(?i:\p{Lower})`, but not `A(?i:\p{Lower})`.
- Using these properties case-insensitively is basically never done intentionally, so you're unlikely to encounter this error unless it's catching a mistake.

## Similar projects

[js_regex](https://github.com/jaynetics/js_regex) transpiles [Onigmo](https://github.com/k-takata/Onigmo) regexes to JavaScript (Onigmo is a fork of Oniguruma that has slightly different syntax/behavior). js_regex is written in Ruby and relies on Ruby's Onigmo parser, which means regexes must be pre-transpiled to use them in JavaScript. In contrast, Oniguruma-To-ES is written in JavaScript, so it can be used at runtime. js_regex also produces regexes with more edge cases that don't perfectly follow Oniguruma's behavior, in addition to the Oniguruma/Onigmo differences.
[js_regex](https://github.com/jaynetics/js_regex) transpiles [Onigmo](https://github.com/k-takata/Onigmo) regexes to JavaScript (Onigmo is a fork of Oniguruma that has slightly different syntax/behavior). js_regex is written in Ruby and relies on Ruby's built-in Onigmo parser, which means regexes must be transpiled ahead of time to use them in JavaScript. In contrast, Oniguruma-To-ES is written in JavaScript, so it can be used at runtime. js_regex also produces regexes with more edge cases that don't perfectly follow Oniguruma's behavior, in addition to the Oniguruma/Onigmo differences.

## About

Expand Down
2 changes: 1 addition & 1 deletion dist/index.min.js

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

2 changes: 1 addition & 1 deletion spec/match-assertion.spec.js
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ beforeEach(() => {
});

describe('Assertion', () => {
// For kinds `lookahead` and `lookbehind`, see `match-lookaround.spec.js`
// [Note] For kinds `lookahead` and `lookbehind`, see `match-lookaround.spec.js`

describe('line_end', () => {
it('should match at the end of the string', () => {
Expand Down
Loading

0 comments on commit 28c14af

Please sign in to comment.