Skip to content

Commit

Permalink
add profiling option
Browse files Browse the repository at this point in the history
  • Loading branch information
bertiqwerty committed May 21, 2024
1 parent b1253e1 commit d694fc4
Show file tree
Hide file tree
Showing 3 changed files with 35 additions and 13 deletions.
45 changes: 32 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -138,19 +138,38 @@ from the project's root.

## Rough Time Measurements
We compare the Rormula to the well-established and way more mature package [Formulaic](https://github.com/matthewwardrop/formulaic).
The [tests](test/test_wilkinson.py) create a formula in Wilkinson notation and sample 100 random data points. The output on my machine is
```
Rormula took 0.0040s
Formulaic took 0.7854s
```
We have separated categorical and numerical data beforehand. If we let rormula do the separation and pass a Pandas dataframe, we obtain
```
Rormula took 0.0487s
Formulaic took 0.7699s
The [tests](rormula/test/test_wilkinson.py) create a formula in Wilkinson notation and sample 100 random data points. The output on my machine is
```
- test just numerical
Rormula took 0.0020s
Rormula asdf took 0.0247s
Formulaic took 0.2037s
- test numerical and categorical
Rormula took 0.0045s
Rormula asdf took 0.0300s
Formulaic took 0.3403s
```
For the first and forth lines that start with `Rormula took`, we have separated categorical and numerical data beforehand.
For the result in the second and fifth lines that start with `Rormula asdf took`, we pass and receive pandas dataframes.
The time is measured for 100 applications of the formula. We used a small data set with 100 rows. For more rows, e.g., 10k+, formulaic becomes competitive and better.

## Profiling
We use [Counts](https://github.com/nnethercote/counts/) for profiling Rust code.

To run profiling one can use
```
Rormula returns a list of column names and the data as Numpy array. If we want a Pandas dataframe as result we obtain
maturin develop --release --features print_timings
python test/test_wilkinson.py 2> counts.txt
counts -i -e counts.txt
```
Rormula took 0.0744s
Formulaic took 0.7639s
To profile other specific parts of the Rust-code add
```rust
#[cfg(feature = "print_timings")]
let now = std::time::Instant::now();

// code snippet to be profiled

#[cfg(feature = "print_timings")]
eprintln!("name of code snippet {}", now.elapsed().as_nanos());
```
The time is measured for 100 applications of the formula. We used a small data set with 100 rows. For more rows, e.g., 10k+, formulaic becomes competitive and better.
Note that running in profiling mode makes the whole program slower and the time measurements of the section above will not hold anymore.
1 change: 1 addition & 0 deletions rormula/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ rormula-rs = { path = "../rormula-rs" }
[features]
extension-module = ["pyo3/extension-module"]
default = ["extension-module"]
print_timings = ["rormula-rs/print_timings"]

[dev-dependencies]
criterion = { version = "0.5.1", features = ["html_reports"] }
Expand Down
2 changes: 2 additions & 0 deletions rormula/test/test_wilkinson.py
Original file line number Diff line number Diff line change
Expand Up @@ -189,5 +189,7 @@ def test_separated():


if __name__ == "__main__":
print("- test just numerical")
test_numerical()
print("- test numerical and categorical")
test_num_cat()

0 comments on commit d694fc4

Please sign in to comment.