Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve casbin-rs bench #109

Closed
GopherJ opened this issue Apr 12, 2020 · 63 comments
Closed

Improve casbin-rs bench #109

GopherJ opened this issue Apr 12, 2020 · 63 comments
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@GopherJ
Copy link
Member

GopherJ commented Apr 12, 2020

While doing the same bench test suite like casbin golang: https://github.com/casbin/casbin/blob/master/model_b_test.go

I found we can get similar results with CachedEnforcer but the normal enforcer is 6x ~ 7x slower than golang version.

Can someone help investigating?

@xcaptain @DevinR528 @PsiACE @hackerchai

eg. running small size of rbac model test:

Casbin Golang

BenchmarkRBACModelSmall-8                     	   10000	    113285 ns/op	   28528 B/op	     790 allocs/op

Casbin Rust

test b_benchmark_rbac_model_small                ... bench:     729,440 ns/iter (+/- 266,774)

Ref

@GopherJ
Copy link
Member Author

GopherJ commented Apr 12, 2020

Pay attention that the bench here is all about enforcing.

@GopherJ
Copy link
Member Author

GopherJ commented Apr 12, 2020

See also : rhaiscript/rhai#101

@GopherJ GopherJ changed the title Fix bench Fix casbin-rs bench Apr 12, 2020
@GopherJ
Copy link
Member Author

GopherJ commented Apr 12, 2020

https://casbin.org/casbin-rs/dev/bench/

You can visualize it here. Seems actions took 1h30m to finish all the bench. There must be a problem somewhere.

UPDATE

It took 1h30m because I had a mistake on bench code, now it's ok....

@xcaptain
Copy link
Contributor

We can first try https://github.com/flamegraph-rs/flamegraph to see where is the bottleneck.

@hackerchai
Copy link
Member

@GopherJ
Where can we see the visualized graph for casbin(go)? Or we should run the benchmark manuallly

@GopherJ
Copy link
Member Author

GopherJ commented Apr 13, 2020

@hackerchai You can run it through:

go test -bench=. -benchmem

there is no action benchmark graph on casbin golang

@hackerchai
Copy link
Member

@GopherJ Get it

@GopherJ
Copy link
Member Author

GopherJ commented Apr 13, 2020

@xcaptain are you able to generate a graph using: https://github.com/casbin-rs/casbin-flamegraph ?

I failed to generate

@GopherJ
Copy link
Member Author

GopherJ commented Apr 13, 2020

Basically I have no idea why add_grouping_policy is so slow in casbin-rs... I changed it to batch operations and seems better but the enforce speed doesn't change.

Also while checking the source of rhai::Scope , they use Vec behind, once the evaluation of one policy finishes, we should truncate it. A simple rewind improves 10% on speed

I think most of the lost time is on creating async_std runtime, we shouldn't use block_on in rust bench iter

UPDATE

add_grouping_policy is not slow but creating async_std runtime is slow...

@xcaptain
Copy link
Contributor

I generated a simple graph from the b_benchmark_rbac_model_small

image

if we use join_all at

b.iter(|| await_future(e.enforce(&["user501", "data9", "read"])).unwrap());
may improve some performance.

If we do await in a loop then we are writing sync code, can't take advantages of async.

@GopherJ
Copy link
Member Author

GopherJ commented Apr 13, 2020

using await in loop is ok, but block_on will
create runtime which basically spawns threads

sadly libtest bencher and criterion don't support executor out of box.

@DevinR528
Copy link
Contributor

I'm sure the await stuff isn't helping but only when the above layer gets smaller is there meaningful information that basically just says the async stuff runs only as long as the enforcer code which is almost completely rhai runtime execution.
These two calls are killing us according to the flamegraph.

let mut engine = Engine::new();
let mut scope: Scope = Scope::new();

but because of lifetime stuff and enforcer taking a &mut self you can't include them in the Enforcer struct or as lazy_static! { Arc<Mutex<Scope/Engine>>> ....

I do agree with @GopherJ that comparing rhai to whatever the go version uses is exactly the information that would help this along.

@GopherJ
Copy link
Member Author

GopherJ commented Apr 13, 2020

Yes rhai's benchmark is worser than govaluate.

With the latest commit, casbin golang is 6x ~ 7x faster than casbin-rs including the fact that rhai is slower and we are repeatedly creating async-std runtime in every b.iter. It's ok for me.

Maybe we can try to share engine because it seems creating a new Engine while enforcing is a little bit heavy as flamegraph shows. And eventually see how to improve rhai's benchmark.

@DevinR528
Copy link
Contributor

Maybe we can try to share engine because it seems creating a new Engine while enforcing is a little bit heavy as flamegraph shows.

I completely agree although I wasn't able to get that to work. The only other idea I had would be passing a Scope and Engine to Enforcer::enforce but that creates more overhead for the user so not sure. You still have to run the interpreter which is the slow part. The flamegraph does show that the async runtime isn't adding time to that particular run of the program.
I'm pretty sure if it was it would look like this

         | enforcer code      |
      | another call in rt         |
   |       call inside runtime       |
|              async start                  |

@GopherJ
Copy link
Member Author

GopherJ commented Apr 13, 2020

@DevinR528 Why not add directly another field in enforcer? I think it's ok.

@GopherJ
Copy link
Member Author

GopherJ commented Apr 13, 2020

I propose you guys to check the code and see what you can improve!

Uses https://github.com/BurntSushi/cargo-benchcmp to see the changes

@DevinR528
Copy link
Contributor

I did try adding it as a field they both (Scope and Engine) have lifetimes, that makes the EventEmitter::on method impossible to write a type signature for.

Creating a lazy_static Engine actually helps out a lot
await-fut is current master

name                                         await-fut ns/iter  lazy-engine ns/iter  diff ns/iter   diff %  speedup 
 b_benchmark_abac_model                       114,472            5,367                    -109,105  -95.31%  x 21.33 
 b_benchmark_basic_model                      121,416            8,004                    -113,412  -93.41%  x 15.17 
 b_benchmark_cached_abac_model                229                227                            -2   -0.87%   x 1.01 
 b_benchmark_cached_key_match                 229                228                            -1   -0.44%   x 1.00 
 b_benchmark_cached_priority_model            226                221                            -5   -2.21%   x 1.02 
 b_benchmark_cached_rbac_model                225                217                            -8   -3.56%   x 1.04 
 b_benchmark_cached_rbac_model_large          228                224                            -4   -1.75%   x 1.02 
 b_benchmark_cached_rbac_model_medium         222                224                             2    0.90%   x 0.99 
 b_benchmark_cached_rbac_model_small          230                225                            -5   -2.17%   x 1.02 
 b_benchmark_cached_rbac_model_with_domains   252                246                            -6   -2.38%   x 1.02 
 b_benchmark_cached_rbac_with_deny            225                219                            -6   -2.67%   x 1.03 
 b_benchmark_cached_rbac_with_resource_roles  228                219                            -9   -3.95%   x 1.04 
 b_benchmark_key_match                        163,193            45,660                   -117,533  -72.02%   x 3.57 
 b_benchmark_priority_model                   116,859            7,528                    -109,331  -93.56%  x 15.52 
 b_benchmark_raw                              5                  5                               0    0.00%   x 1.00 
 b_benchmark_rbac_model                       129,534            19,356                   -110,178  -85.06%   x 6.69 
 b_benchmark_rbac_model_large                 37,198,848         37,509,842                310,994    0.84%   x 0.99 
 b_benchmark_rbac_model_medium                3,811,947          3,703,521                -108,426   -2.84%   x 1.03 
 b_benchmark_rbac_model_small                 479,806            380,284                   -99,522  -20.74%   x 1.26 
 b_benchmark_rbac_model_with_domains          138,648            27,470                   -111,178  -80.19%   x 5.05 
 b_benchmark_rbac_with_deny                   134,611            23,782                   -110,829  -82.33%   x 5.66 
 b_benchmark_rbac_with_resource_roles         127,703            16,762                   -110,941  -86.87%   x 7.62 
 b_benmark_cached_basic_model                 222                219                            -3   -1.35%   x 1.01

@xcaptain
Copy link
Contributor

The flamegraph does show that the async runtime isn't adding time to that particular run of the program.

Really async_std so light weight? creating thousands of futures and run them doesn't slow down the whole program? That's hard to believe.

I agree to take some actions to improve rhai's performance.

@DevinR528
Copy link
Contributor

No you are right it does add some overhead this is the same function just with async keyword and being awaited by async_std::task::block_on(future)

test b_benchmark_async_raw                       ... bench:          39 ns/iter (+/- 0)
test b_benchmark_raw                             ... bench:           5 ns/iter (+/- 0)

What I mean about the flamegraph is that since it measures everything in relation to the time each function took to run the rhai stuff is expensive enough it hides the cost of async stuff. In other words the bottle neck is not async.

@GopherJ
Copy link
Member Author

GopherJ commented Apr 15, 2020

The latest bench from my i7-10gen if anyone wants to compare

rust

test b_benchmark_abac_model                      ... bench:      28,818 ns/iter (+/- 23,491)
test b_benchmark_basic_model                     ... bench:      31,537 ns/iter (+/- 1,870)
test b_benchmark_cached_abac_model               ... bench:         195 ns/iter (+/- 1)
test b_benchmark_cached_key_match                ... bench:         200 ns/iter (+/- 17)
test b_benchmark_cached_priority_model           ... bench:         195 ns/iter (+/- 1)
test b_benchmark_cached_rbac_model               ... bench:         196 ns/iter (+/- 1)
test b_benchmark_cached_rbac_model_large         ... bench:         197 ns/iter (+/- 2)
test b_benchmark_cached_rbac_model_medium        ... bench:         194 ns/iter (+/- 0)
test b_benchmark_cached_rbac_model_small         ... bench:         196 ns/iter (+/- 1)
test b_benchmark_cached_rbac_model_with_domains  ... bench:         218 ns/iter (+/- 13)
test b_benchmark_cached_rbac_with_deny           ... bench:         197 ns/iter (+/- 0)
test b_benchmark_cached_rbac_with_resource_roles ... bench:         195 ns/iter (+/- 1)
test b_benchmark_key_match                       ... bench:      82,468 ns/iter (+/- 24,648)
test b_benchmark_priority_model                  ... bench:      36,178 ns/iter (+/- 8,949)
test b_benchmark_raw                             ... bench:           5 ns/iter (+/- 0)
test b_benchmark_rbac_model                      ... bench:      43,781 ns/iter (+/- 13,203)
test b_benchmark_rbac_model_large                ... bench:  42,604,607 ns/iter (+/- 6,687,334)
test b_benchmark_rbac_model_medium               ... bench:   3,524,376 ns/iter (+/- 589,976)
test b_benchmark_rbac_model_small                ... bench:     378,847 ns/iter (+/- 22,767)
test b_benchmark_rbac_model_with_domains         ... bench:      52,639 ns/iter (+/- 3,737)
test b_benchmark_rbac_with_deny                  ... bench:      47,611 ns/iter (+/- 506)
test b_benchmark_rbac_with_resource_roles        ... bench:      41,405 ns/iter (+/- 12,552)
test b_benmark_cached_basic_model                ... bench:         198 ns/iter (+/- 5)
BenchmarkCachedRaw-8                          	59989927	        17.7 ns/op	       0 B/op	       0 allocs/op
BenchmarkCachedBasicModel-8                   	 6940340	       158 ns/op	     104 B/op	       4 allocs/op
BenchmarkCachedRBACModel-8                    	 8026996	       166 ns/op	     104 B/op	       4 allocs/op
BenchmarkCachedRBACModelSmall-8               	 7813680	       157 ns/op	     104 B/op	       4 allocs/op
BenchmarkCachedRBACModelMedium-8              	 7791793	       156 ns/op	     104 B/op	       4 allocs/op
BenchmarkCachedRBACModelLarge-8               	 7417227	       161 ns/op	      96 B/op	       3 allocs/op
BenchmarkCachedRBACModelWithResourceRoles-8   	 6675254	       163 ns/op	     104 B/op	       4 allocs/op
BenchmarkCachedRBACModelWithDomains-8         	 6219783	       169 ns/op	     120 B/op	       4 allocs/op
BenchmarkCachedABACModel-8                    	  302802	      3854 ns/op	    2040 B/op	      44 allocs/op
BenchmarkCachedKeyMatchModel-8                	 6123136	       176 ns/op	     152 B/op	       4 allocs/op
BenchmarkCachedRBACModelWithDeny-8            	 7408162	       159 ns/op	     104 B/op	       4 allocs/op
BenchmarkCachedPriorityModel-8                	 6789294	       161 ns/op	     104 B/op	       4 allocs/op
BenchmarkCachedRBACModelMediumParallel-8      	15783098	       103 ns/op	     104 B/op	       4 allocs/op
BenchmarkRaw-8                                	58364529	        20.6 ns/op	       0 B/op	       0 allocs/op
BenchmarkBasicModel-8                         	  118158	      9730 ns/op	    4344 B/op	      90 allocs/op
BenchmarkRBACModel-8                          	   85212	     14070 ns/op	    5232 B/op	     118 allocs/op
BenchmarkRBACModelSmall-8                     	   14871	     76201 ns/op	   28528 B/op	     790 allocs/op
BenchmarkRBACModelMedium-8                    	    1846	    701283 ns/op	  244720 B/op	    7090 allocs/op
BenchmarkRBACModelLarge-8                     	     127	   9456547 ns/op	 2408176 B/op   70090 allocs/op
BenchmarkRBACModelWithResourceRoles-8         	   82545	     12925 ns/op	    6608 B/op     126 allocs/op
BenchmarkRBACModelWithDomains-8               	   72973	     16691 ns/op	    8360 B/op     177 allocs/op
BenchmarkABACModel-8                          	  318970	      4013 ns/op	    2032 B/op      43 allocs/op
BenchmarkKeyMatchModel-8                      	   75254	     15622 ns/op	    9477 B/op     157 allocs/op
BenchmarkRBACModelWithDeny-8                  	   79492	     12594 ns/op	    5424 B/op     123 allocs/op
BenchmarkPriorityModel-8                      	  111088	     10113 ns/op	    4640 B/op      96 allocs/op

Go version is around 3 - 5 times faster including async runtime influence and rhai's bench difference comparing to govaluate

@GopherJ
Copy link
Member Author

GopherJ commented Apr 16, 2020

It seems rhai::Engine creation is really.... slow:

test b_benchmark_creating_engine_instance        ... bench:      43,615 ns/iter (+/- 1,539)
test b_benchmark_creating_raw_engine_instance    ... bench:      19,203 ns/iter (+/- 125)

Even I already removed many built in functions and the engine doesn't need to register them during runtime but still there are many functions registered during runtime

@GopherJ
Copy link
Member Author

GopherJ commented Apr 16, 2020

share engine is done #117 and the rest possibilities of benchmark improvements should be on rhai side instead of casbin-rs

@GopherJ GopherJ changed the title Fix casbin-rs bench Improve casbin-rs bench Apr 16, 2020
@hsluoyz
Copy link
Member

hsluoyz commented May 4, 2020

@GopherJ @hackerchai we should find out what part of rhai is slow, I saw you made some progress in discussion here: rhaiscript/rhai#101 . Rust is supposed to be at least 10x faster than Golang. Being even slower than Golang is not understandable.

Or is there any other Rust expression evaluators that runs fast?

@GopherJ
Copy link
Member Author

GopherJ commented May 4, 2020

@hsluoyz It'll be a long story to improve the bench. Yes it tends to be faster and the rawEnforce is 3x faster than golang but the general benchmark of a project will be hard to say.

Including like I mentioned in this issue, the code we used to benchmark isn't that correct, we didn't remove the time of creating async executor.

Then rhai is slower than govaluate, basically in my opinion rhai has a pretty large prospective, we can nearly implement everything that a language has in rhai. like (threads, promise etc...). But based on this view, it's not efficient enough, I read some of their code but I cannot help too much because I didn't a compiler-principle background.

But to improve the bench, I'll continue to read their code and see if I can help

@schungx
Copy link
Contributor

schungx commented May 8, 2020

https://github.com/casbin/casbin-rs/blob/master/src/enforcer.rs#L108-L111

May I suggest that you create a custom package initialized by the four sub-packages and share it. Then you only have to register one package each time a scripting engine is created. Having one large package improves speed because a function needs not be searched in four different places.

use rhai::packages::{ArithmeticPackage, BasicArrayPackage, BasicMapPackage, LogicPackage};

use rhai::def_package;

def_package!(rhai:CasbinPackage:"Package for Casbin use.", lib, {
    ArithmeticPackage::init(lib);
    LogicPackage::init(lib);
    BasicArrayPackage::init(lib);
    BasicMapPackage::init(lib);
});

The above creates CasbinPackage that you can then use to load_package in one single call.

@schungx
Copy link
Contributor

schungx commented May 8, 2020

But to improve the bench, I'll continue to read their code and see if I can help

There is an effort on-going to compile Rhai scripts into bytecodes. Hopefully that'll make it faster...

@schungx
Copy link
Contributor

schungx commented May 23, 2020

Well, my opinion is that, it is never too early to make API's async, because async goes all the way. You'll regret it later on when you need to cater for some async feature and you need to introduce a breaking change in your public API.

My suggestion is to keep enforce async, but introduce an enforce_sync to do it sync. This type of dual API is quite common because, frankly speaking, not all systems require everything to be done async and some needs the raw throughput.

@GopherJ
Copy link
Member Author

GopherJ commented May 23, 2020

@schungx The only part which needs to think about this problem is enforce, basically it doesn't need to be async, but because of sharing CoreApi with CachedEnforcer and CachedEnforcer may needs to communicate with DB (eg. redis) so maybe keep the api async is better.

I have change them back to sync because CachedApi will be called by EventEmitter's emit method and I didn't succeed to make an async emitter due to lifetime issue. so finally I changed it back to sync because otherwise I'll need to call async code in sync functions.

Currently we already have enforce, enforce_mut so I don't think we should add another enforce_sync because finally it will be complicated to understand why do we need three apis for nearly the same thing. So I think we can change enforce to sync and see if async emitter can be implemented, if yes then in v2.0.0 we can change it back to async to support this breaking change

@schungx
Copy link
Contributor

schungx commented May 23, 2020

Yeah, that sounds sound. :-)

I'm still trying out a few optimizations on the Rhai side. Will open up a PR in a bit.

@GopherJ
Copy link
Member Author

GopherJ commented May 23, 2020

Ok thanks @schungx.

Looking forward to seeing your optimizations:)

@GopherJ
Copy link
Member Author

GopherJ commented May 23, 2020

Also if rust-lang/rust#72078 get solved then enforce_mut will not be necessary and enforce_sync can be added if necessary

@schungx
Copy link
Contributor

schungx commented May 23, 2020

What you want should not be possible. That's why the Rust standard library is sprinkled with tons of get, get_ref, get_mut variations... And there are Deref, DerefMut, Index, IndexMut, variations for traits.

@schungx
Copy link
Contributor

schungx commented May 23, 2020

Will open up a PR in a bit.

PR #165 opened, but somehow it is including all the commits from my previous PR... no idea what's going on. But anyhow, the last commit is the right one.

@GopherJ
Copy link
Member Author

GopherJ commented May 23, 2020

async vs sync

before: async
after: sync

 b_benchmark_abac_model                4,185           3,957                  -228   -5.45%   x 1.06 
 b_benchmark_basic_model               4,622           4,418                  -204   -4.41%   x 1.05 
 b_benchmark_key_match                 16,593          15,910                 -683   -4.12%   x 1.04 
 b_benchmark_priority_model            6,167           4,826                -1,341  -21.74%   x 1.28 
 b_benchmark_raw                       5               5                         0    0.00%   x 1.00 
 b_benchmark_rbac_model                7,049           7,374                   325    4.61%   x 0.96 
 b_benchmark_rbac_model_large          10,254,152      10,175,106          -79,046   -0.77%   x 1.01 
 b_benchmark_rbac_model_medium         991,206         1,004,929            13,723    1.38%   x 0.99 
 b_benchmark_rbac_model_small          102,597         101,684                -913   -0.89%   x 1.01 
 b_benchmark_rbac_model_with_domains   6,991           6,775                  -216   -3.09%   x 1.03 
 b_benchmark_rbac_with_deny            10,469          10,314                 -155   -1.48%   x 1.02 
 b_benchmark_rbac_with_resource_roles  4,981           4,799                  -182   -3.65%   x 1.04

seems async-std's block_on doesn't have too much thread stuff so the result is similar,

@GopherJ
Copy link
Member Author

GopherJ commented May 23, 2020

tokio will be totally another thing because tokio's block_on is quite heavy

before: async
after: sync

 name                                  before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 b_benchmark_abac_model                181,699         4,056              -177,643  -97.77%  x 44.80 
 b_benchmark_basic_model               202,599         4,484              -198,115  -97.79%  x 45.18 
 b_benchmark_key_match                 192,963         16,631             -176,332  -91.38%  x 11.60 
 b_benchmark_priority_model            168,218         4,906              -163,312  -97.08%  x 34.29 
 b_benchmark_raw                       5               5                         0    0.00%   x 1.00 
 b_benchmark_rbac_model                170,598         7,080              -163,518  -95.85%  x 24.10 
 b_benchmark_rbac_model_large          13,733,887      9,921,280        -3,812,607  -27.76%   x 1.38 
 b_benchmark_rbac_model_medium         1,443,785       971,702            -472,083  -32.70%   x 1.49 
 b_benchmark_rbac_model_small          338,414         100,512            -237,902  -70.30%   x 3.37 
 b_benchmark_rbac_model_with_domains   204,542         6,875              -197,667  -96.64%  x 29.75 
 b_benchmark_rbac_with_deny            209,244         10,461             -198,783  -95.00%  x 20.00 
 b_benchmark_rbac_with_resource_roles  202,211         4,923              -197,288  -97.57%  x 41.07 

@schungx
Copy link
Contributor

schungx commented May 24, 2020

Does it mean that we're finally close to Go speed? That's a good start... the next goal is to be faster...

BTW, can you pull from my latest in https://github.com/schungx/rhai

This version implements most of the common operators for standard types as a fast-track. For casbin which relies heavily on operators for standard types, this may be another 10% or so speed improvement.

@GopherJ
Copy link
Member Author

GopherJ commented May 24, 2020

Hi @schungx , it's already close but go version is still 1.5 ~ 2x faster. I'll add soon a bench test on role manager to see the problem is on role manager or on evaluation

@GopherJ
Copy link
Member Author

GopherJ commented May 24, 2020

@schungx I did a bench using your master, the result is quite similar

test b_benchmark_abac_model               ... bench:       4,028 ns/iter (+/- 73)
test b_benchmark_basic_model              ... bench:       4,308 ns/iter (+/- 256)
test b_benchmark_key_match                ... bench:      16,482 ns/iter (+/- 267)
test b_benchmark_priority_model           ... bench:       5,721 ns/iter (+/- 1,164)
test b_benchmark_raw                      ... bench:           5 ns/iter (+/- 0)
test b_benchmark_rbac_model               ... bench:       6,929 ns/iter (+/- 83)
test b_benchmark_rbac_model_large         ... bench:  10,875,148 ns/iter (+/- 4,662,432)
test b_benchmark_rbac_model_medium        ... bench:   1,025,110 ns/iter (+/- 313,170)
test b_benchmark_rbac_model_small         ... bench:     104,318 ns/iter (+/- 7,969)
test b_benchmark_rbac_model_with_domains  ... bench:       6,861 ns/iter (+/- 289)
test b_benchmark_rbac_with_deny           ... bench:      10,403 ns/iter (+/- 280)
test b_benchmark_rbac_with_resource_roles ... bench:       5,006 ns/iter (+/- 43)

@schungx
Copy link
Contributor

schungx commented May 24, 2020

Well, it improves speed 10% for scripts heavy in numerical and logical operators...

The one major thing on the table that I haven't been able to resolve is the need to copy values. Normally, it is fine for numbers and booleans etc. For casbin, however, almost all of those values are strings. In Rhai, that means always copying entire strings around whenever you do something.

For example, a simple a == b copies two strings from variables a and b, compares them, and then immediately throws them away. That's just the way Rhai is designed... short of interning all strings which has its own set of issues.

govaluate is actually much better at this because all those strings are simply interfaces pointing to immutable objects and garbage-collected. So they are simply passed around. Comparing them are extremely fast, as long as you don't modify them (which casbin doesn't). Modifying them is a cloning operation.

My next experiment is to intern all Rhai strings into Rc-wrapped shared values, essentially re-creating the immutable strings.

@schungx
Copy link
Contributor

schungx commented May 25, 2020

Turns out it is extremely easy to make all strings immutable in Rhai, and get rid of all cloning/copying.

Can you pull from my latest to test it out?

Make sure you change this in enforcer.rs:

    lib.set_fn_1("escape_assertion", |mut s: rhai::ImmutableString| {
        Arc::make_mut(&mut s);
        Ok(Arc::try_unwrap(s).unwrap())
    });

because Rhai now works with immutable strings.

Unfortunately I can no longer compile casbin-rs on Windows because of the wepoll issue...

@GopherJ
Copy link
Member Author

GopherJ commented May 25, 2020

@schungx To compile, you can downgrade all async-std's version to 1.5.0, but maybe it's better if you can create an issue for smol OR async-std say that you have an error to compile so that they can fix

@schungx
Copy link
Contributor

schungx commented May 26, 2020

you can downgrade all async-std's version to 1.5.0

Doesn't work. Still tries to pull in wepoll-sys. I think it is probably depended by something else...

@schungx
Copy link
Contributor

schungx commented May 26, 2020

OK, need to switch the default toolchain to stable-x86_64-pc-windows-msvc to compile wepoll.

Using stable-x86_64-pc-windows-gnu will require installation of MinGW-w64.

It is a bummer as the GNU toolchain seems to generate faster code for Windows...

@schungx
Copy link
Contributor

schungx commented May 26, 2020

@GopherJ please check out #168

It uses the latest version of Rhai which uses immutable strings internally to avoid copying. I get these results running on my computer:

Before: 0.14.1
After: 0.15.0 (?)

 name                                  before ns/iter  after ns/iter  diff ns/iter   diff %  speedup 
 b_benchmark_abac_model                18,644          15,452               -3,192  -17.12%   x 1.21
 b_benchmark_basic_model               16,760          13,946               -2,814  -16.79%   x 1.20
 b_benchmark_key_match                 52,274          49,863               -2,411   -4.61%   x 1.05
 b_benchmark_priority_model            25,034          15,765               -9,269  -37.03%   x 1.59
 b_benchmark_raw                       8               8                         0    0.00%   x 1.00
 b_benchmark_rbac_model                26,607          26,141                 -466   -1.75%   x 1.02
 b_benchmark_rbac_model_large          41,000,650      30,705,600      -10,295,050  -25.11%   x 1.34
 b_benchmark_rbac_model_medium         3,874,005       2,935,262          -938,743  -24.23%   x 1.32
 b_benchmark_rbac_model_small          415,345         312,054            -103,291  -24.87%   x 1.33
 b_benchmark_rbac_model_with_domains   30,380          22,582               -7,798  -25.67%   x 1.35
 b_benchmark_rbac_with_deny            52,111          29,750              -22,361  -42.91%   x 1.75
 b_benchmark_rbac_with_resource_roles  19,808          16,575               -3,233  -16.32%   x 1.20

@GopherJ
Copy link
Member Author

GopherJ commented May 26, 2020

@schungx Yes the CI passed on windows so normally it should work, the numbers look awesome!

@GopherJ
Copy link
Member Author

GopherJ commented Jun 1, 2020

@schungx I see a significant bench improvement on [email protected], will upgrade on casbin-rs side soon

 b_benchmark_abac_model                4,779           3,833                  -946  -19.79%   x 1.25 
 b_benchmark_basic_model               4,988           4,100                  -888  -17.80%   x 1.22 
 b_benchmark_key_match                 19,024          16,206               -2,818  -14.81%   x 1.17 
 b_benchmark_priority_model            5,381           6,206                   825   15.33%   x 0.87 
 b_benchmark_raw                       5               5                         0    0.00%   x 1.00 
 b_benchmark_rbac_model                7,699           8,095                   396    5.14%   x 0.95 
 b_benchmark_rbac_model_large          11,736,524      12,590,095          853,571    7.27%   x 0.93 
 b_benchmark_rbac_model_medium         1,124,576       834,781            -289,795  -25.77%   x 1.35 
 b_benchmark_rbac_model_small          111,415         88,178              -23,237  -20.86%   x 1.26 
 b_benchmark_rbac_model_with_domains   7,593           6,224                -1,369  -18.03%   x 1.22 
 b_benchmark_rbac_with_deny            11,542          8,505                -3,037  -26.31%   x 1.36 
 b_benchmark_rbac_with_resource_roles  5,514           4,340                -1,174  -21.29%   x 1.27 
 b_benchmark_role_manager_large        2,896,793       2,375,036          -521,757  -18.01%   x 1.22 
 b_benchmark_role_manager_medium       230,392         217,309             -13,083   -5.68%   x 1.06 
 b_benchmark_role_manager_small        22,566          20,731               -1,835   -8.13%   x 1.09 

bravo! This bench is already similar to golang's version

@schungx
Copy link
Contributor

schungx commented Jun 2, 2020

@GopherJ since you've turned the casbin-rs API from sync to async, I have a question for you.

Do you find the async API to be slower, when called in a sync manner (perhaps via block_on)?

Do you feel justified in moving the API from sync to async?

@GopherJ
Copy link
Member Author

GopherJ commented Jun 2, 2020

@schungx async can only be useful for IO operation like: network connection.

casbin-rs has db adapters so I think async is necessary,it helps the situation where many policies changes are necessary

If we call async in sync manner, certainly it'll be slower,even rust has some zero cost abstractions,still at least a Poll is necessary.

@schungx
Copy link
Contributor

schungx commented Jun 2, 2020

Thanks... that's what I thought. Do you have any benchmark data on sync vs async and how much of a slowdown it is?

I can see from your previous data that tokio is terrible for block_on but async_std seems to be OK here...

@GopherJ
Copy link
Member Author

GopherJ commented Jun 2, 2020

@schungx Hi, block_on is ok for [email protected], there isn't big overhead (maybe xx ms).

However, block_on is terrible for [email protected] which uses smol as runtime, as you can see from here:

async-rs/async-std#770

currently rust doesn't have an interface to define how an executor should work, otherwise some correct async benchers can be implemented:

b.bench_async_using_executor(tokio_executor, async move {
...
})

see also: rust-lang/rust#71186

that means, starts from [email protected] we cannot bench async code anymore because of the block_on overhead. Tokio's block_on has always been heavy so it cannot be used for benchmark neither but the microbenchmarks here cannot mean that it runs slowly in real-word applications, because in real-word application we only need one block_on for the whole applications.

fn main() {
async_std::block_on(async move {
// a multi-threaded executor
// we can spawn tasks on this executor
// it's always the same executor, not like bench code which create a new executor in every iteration
});
}

@schungx
Copy link
Contributor

schungx commented Jun 3, 2020

As per the async-std thread, can you use smol::run() instead of block_on?

@GopherJ
Copy link
Member Author

GopherJ commented Jun 3, 2020

@schungx yes sure

@GopherJ
Copy link
Member Author

GopherJ commented Jul 18, 2020

currently casbin-rs has already better performance than casbin-golang, thanks to @schungx for all the support that you've made. It's time to close this issue.

@GopherJ GopherJ closed this as completed Jul 18, 2020
@hsluoyz
Copy link
Member

hsluoyz commented Jul 26, 2020

@GopherJ @schungx congrats and thanks for all your work! This is very impressing.

BTW, where can we see the benchmark number between Casbin-RS and Casbin Golang?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

6 participants