If you are interested in contributing to RESS know that your help would be appreciated!
Feel free to open issues and/or pull requests for anything that you see that might be an improvement. Please note that ressa and resast may already have an issue opened.
I do not work on this full time, please be patient if I am not able to respond quickly.
The primary development branch is the next
branch. It would be ideal to create any pull requests against that branch over master
or one of the other feature branches that might have been missed when cleaning up.
For any PRs know that the code must pass travis and appveyor tests before they will be reviewed/merged. These test include the following commands you could use to check your version.
$ npm i
$ cargo test
$ cargo run --example major_libs --release
The release flag in the above is due to the fact that this example is a naive benchmark to validate that changes haven't completely ruined the performance. Feel free to leave this flag off when you are testing for a PR.
This will run all of the project's unit tests as well as a test against some major js libraries, namely Angular-js, Jquery, React/React-Dom, Vue, Moment.js and Dexie.
If you are interested in becoming a maintainer send me an email and we can talk more about what that looks like.
There are a few things you might need to know to get started. First, the tests and benchmarks require that npm
is installed to pull down the javascript they evaluate so you'll need node.js installed.
Because the benchmarks use Criterion, it can be difficult to use them with profiling so each of the single token benchmarks is extracted out as an example (you can find these in the examples/instruments folder). For the major_libs benchmark, you can use the example with the same name. These are helpful for working with tools like cargo instruments
.
The overall code layout works like this.
- lib.rs
Scanner
: The primary interface for this crate- Mostly this is a wrapper around Tokenizer that handles detecting regexes and calculating line/column numbers
ScannerState
: This is used for caching the state and resetting it. See theScanner::get_state
andScanner::set_state
methods
- erros.rs
- This is where the error structs live. If you add a new error type to the
Tokenizer
you will need to add a Into/From implementation here
- This is where the error structs live. If you add a new error type to the
- look_behnid:
LookBehind
: This is ring like structure that is used to keep the look behind tokens.- For regex detection we only care are the last token we have seen and the three toknes before an open parentheses, so the Scanner keeps two of these on hand.
- The basic idea here is to just use a 3 element array and keep track of where we last put an element to be able to calculate which is
last
,two
orthree
.
MetaToken
: a cheaper token variant which only holds the bare minimum of information for regex detection
- tokenizer
- mod.rs
RawItem
: a cheaper version of theItem
struct from above, it has only as much information as theTokenizer
can determine; aRawToken
and the byte index of the start and end.Tokenizer
: This is the primary export of this module. This struct will perform the actual seperation and classification of tokens- One note about the matching logic, matching on the length of a byte array or string a bunch of times with an if clause is cheaper than matching on the strings directly. Until phf can handle byte slices, this is the fastest method available
- bufer.rs
JSBuffer
: Mostly a reimplementation of std::Chars- For most look_ahead operations there is
look_ahead_matches
which takes a byte slice, however if you are looking for a single byte character thelook_ahead_byte_matche
is slightly faster at_new_line
thecmp
operation on u8 is faster than matching oreq
so checking if something is smaller than a target is faster than doing bounds checks between||
s
- For most look_ahead operations there is
- tokens.rs
RawToken
: This is a token more tailored to directing the Scanner about how to construct atokens::Token
- The three cases that can have new lines carry some extra information with them, the
new_line_count
and thelast_len
(length of the last line)
- The three cases that can have new lines carry some extra information with them, the
CommentKind
: empty version oftokens::Comment
StingKind
: empty version oftokens::StringLit
TemplateKind
: empty version oftokens::Template
- unicode.rs
- bounds checks on
char
s is more effective than binary search (which the two unicode implemtations I could find use) so these function bodies are generated using the approprate table - The generation code may become available in the future but right now it isn't very effective
is_ident_start
: check if achar
has the attribute of ident_startis_id_continue
: check if achar
has the attribute of ident_continueis_other_whitesapce
: the ECMA spec says that any Zs category character is valid whitespace. This function will test any exotic whitespaces
- bounds checks on
- mod.rs