Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize pattern matching on constants #1185

Open
ospencer opened this issue Apr 15, 2022 · 1 comment · May be fixed by #2173
Open

Optimize pattern matching on constants #1185

ospencer opened this issue Apr 15, 2022 · 1 comment · May be fixed by #2173

Comments

@ospencer
Copy link
Member

When matching on constants, we essentially compile something like match (...) { 'a' => ..., 'b' => ..., ... } into an if chain of if (val == 'a') { goto branch 1 } else if (val == 'b') { goto branch 2 } else .... We can instead compile them as though they were variants, and just do a br_table.

@spotandjake
Copy link
Member

spotandjake commented Oct 8, 2024

Investigating this a bit, it does not seem as though using a br_table is always the most efficent, if you have sparse datasets the table is not the best optionally additionally we need to map the data to a 0 based range as such it probably makes sense to use a few heuristics here notably:

  • Primarily continuous dataset (We can have a few caps and just direct them to the default branch manually)
  • More then a few elements as br_table is only really more efficent then branching in the case that checking all the options would take longer.
  • for void constants it would make sense to just eliminate the match all together and do a direct jump.
  • for bool constants it would make sense to build a select statement instead of a conditional

We can however do an additional optimization when implementing this that is mentioned in the todo comment and even if the dataset is sparse we can load the number data from the heap and untag and do a direct match for some of our number types to speed things up.

My current plan of attack is something like the following, (some furthur optimizations could be made such as on large match statements if we notice there are subranges of data we could break it into multiple chained jump tables or a mix of jumps and conditionals):

// Plan
// Get rid of Conditional (Always compile to a table)
// Two table types, Jump_Table, Conditional_Table
// Ensure all constants are the same type
// Jump_Table: The range of the data is some heuristic relative to the branch count (We want the data to be contigous)
// Steps:
// 1) Validate all constants are the same type
// 2) Check if our constant is eligable for our table based approach (i.e != string, bytes, rational)
// 3) Determine the min, and max constant values
// 4) compare the range of the data to the branch paths if our data looks contigous make jump table otherwise conditional
// 5) Generate our mapping code should just be value-min
// 6) Generate our table for all branches
// 7) If we failed at any step generate a conditional approach like before
// Notes:
// The number type is interesting I think it is possible to handle them but we need to look
// Conditional_Table: Our generic Fallback
// table_type = Jump_Table | Conditional_Table
// Table(
// table_type: table_type,
// alias: Ident.t,
// branches: list((const, descision_tree)),
// min: int,
// max: int,
// default_branch: decision_tree
// ) -- We should be able to untag and subtract min, (Untagging heap is different)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants