-
-
Notifications
You must be signed in to change notification settings - Fork 402
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collect success stories! #58
Comments
Hi @emeryberger . I've just used Scalene in my private 3 morning Higher Performance Python tutorial which I ran with a hedge fund, I run this course both privately and as a "public" course for open enrolment several times a year. I added Scalene to the most recent iteration and I plan to use it more in forthcoming sessions. I'm also likely to make use of it in forthcoming public talks. The students last week were very interested in using Scalene further. They use Pandas and NumPy extensively. For the most recent session we used it to identify how Pandas uses more memory than expected in 2 simple situations:
From the first point I've built a bug report for Pandas as I'm sure the RAM used is excessive ( pandas-dev/pandas#37139 ). The report was filed using As a suggestion - if you collected some examples of how Scalene could track poor memory usage in Pandas and Scikit Learn I suspect you'd have some popular blog posts you could share! Scikit Learn for example in a few places will copy data from Python to an external tool if the Many thanks for the tool! |
I used Scalene a couple weeks ago in the context of a Machine Learning homework (Classification using gradient descent). I had to implement the following function (see Iverson bracket notation), where Being unfamiliar with Python and NumPy, I initially wrote the following code (NSFW for NumPy enthusiasts). for i in range(n_features):
for n in range(n_samples):
subgrad[i] += (- y[n] * X[n][i]) if y[n] * (np.dot(X[n], w) + b) < 1 else 0
subgrad[i] += self.lambda1 * (-1 if w[i] < 0 else 1) + 2 * self.lambda2 * w[i] Obviously, this was really slow, with about 80 iterations per minute, when the goal was 10,000. I ran the program through Scalene, which output the following. The column headers are missing, but you can see that 98% of the time is spent in Python, and not in native code, which is the problem. With my (clearly lacking) NumPy skills, I improved it to the following, removing one of the loops. for n in range(n_samples):
if y[n] * (np.dot(X[n], w) + b) < 1:
subgrad += (- y[n] * X[n])
subgrad += self.lambda1 * (w / np.abs(w)) + 2 * self.lambda2 * w This was much better, and allowed to reach the desired goal of 10,000 iterations per minute. It also reduced the time spent in Python, as shown in the Scalene output below. I've learned later this can be improved further, removing all loops! index = y * (np.dot(X, w) + b) < 1
yi = y[index]
subgrad = np.sum(- yi.reshape(yi.size, 1) * X[index,:], axis = 0) As you can see in this final version, only very little time is spent in Python. Thank you Scalene! |
I used scalene to optimize a long-running script in https://github.com/ConsenSys/code_merklization_of_traces. It was great help for someone who only sporadically works with Python, much less with unfamiliar native libraries! |
I used Scalene to profile my library, Rich, which can print tables in the terminal. A user reported that very large tables (10,000 rows) were slow. There were no algorithmic improvements I could see from viewing the code, which lead me to consider profiling. Running Scalene on a script to print a large table highlighted two lines that were taking way more time that I would have expected. The first one was an Optimizing those was fairly trivial and resulted in a 45% improvement in speed. I'd say that was a success. |
Hey @emeryberger hope you're doing well. Just used Scalene today for profiling some pandas code. Chained indexing into a multi-level column names apparently caused a bunch of copying to occur, which Scalene easily found for me! |
@donald-pinckney nice! Can you share the code / the fix? Thanks! |
@donald-pinckney agreed a code example (or even just more detail) about what you were doing would be helpful to recreate useful examples. |
import pandas as pd
import numpy as np
import timeit
# Code to setup example df, to recreate the approximate df structure of my code
column_names_example = [i for i in range(10000)]
index = pd.MultiIndex.from_tuples([("left", c) for c in column_names_example] + [("right", c) for c in column_names_example])
df = pd.DataFrame(np.random.rand(1000, 20000), columns=index)
# We also define a function which takes as input two columns from the dataframe
# It does whatever logic and returns a bool
def keep_column(left_col, right_col):
# Shouldn't really matter what this is, but I had some indexing operations.
# So here is some random indexing operations.
# Note that this only reads from the columns, no writing is done
return left_col[left_col.first_valid_index()] > right_col[right_col.last_valid_index()]
# Ok, finally we have the performance bug:
timeit.timeit(lambda: [c for c in column_names_example if keep_column(df["left"][c], df["right"][c])], number=10) / 10
# > gives average 14.68 seconds on my machine After using Scalene I quickly found the bad line of code, and my simple fix is to lift the indexing out of the loop: df_l = df["left"]
df_r = df["right"]
timeit.timeit(lambda: [c for c in column_names_example if keep_column(df_l[c], df_r[c])], number=10) / 10
# > gives average 0.81 seconds on my machine Obviously lifting a constant out of the loop could help, but I doubt just that would make such a large difference. Probably there is some underlying copying or something going on that only occurs with the double-indexing? So I googled a bit and found this in the documentation:
So there is some documentation that doing this double-indexing on hierarchical columns is bad, but that documentation doesn't really help explain why lifting it out of the loop matters so much. But if we take the suggestion of the documentation, we get another potential fix: timeit.timeit(lambda: [c for c in column_names_example if keep_column(df.loc[:, ("left", c)], df.loc[:, ("right", c)])], number=10) / 10
# > gives average 3.59 seconds on my machine This is an improvement compared to the original code, but its still ~4x slower than my first fix. Probably there is an easy and efficient way to do this without involving Python loops and just using numpy / pandas, but for a pandas beginner such as myself, the performance issues that popped up here were pretty subtle. Scalene was great at helping me find the slow line of code! |
Thanks for that lib ! I figured out with Scalene that , maybe, loading N ( = 10000+ ) times the same |
We've started using scalene over at Semantic Scholar (www.semanticscholar.org) as part of our toolsuite for operationalizing machine learning models. Recently we found a model of ours was cost prohibitive and put an entire product direction in jeopardy. We generated a set of test data and ran our models with Scalene mounted -- the html output was able to pin point our squeakiest wheels and help us validate our changes were having an impact. The process was iterative, precise and repeatable. In the end, we were able to reduce costs by a staggering 92%. With these models, there is also always the question of whether things would be more cost effective running inference services on GPUs rather than CPU. Scalene allowed us to quickly ascertain what fraction of our runtime would benefit from the hardware acceleration, and what CPU-bound code we'd need to pare down to achieve our goals. |
Thanks for this tool :) I used The new GUI reporting interface is slick. |
Used scalene for a hobby/learning project, I was reading a CSV file into dict every time I called the function. I went from 9 minutes to 2 minutes to run my app. |
Used scalene to drop the runtime of a scientific tool we were using from 16 hours to 8 minutes with certain inputs 😄 |
Used scalene to find the hot path of my Python application at work, refactored that part of code and processing time went down from 15 minutes to 1 minute :) |
Inspired by this issue on a different project, I'd love to hear stories from people who have successfully used Scalene. Did you use it to fix a performance problem, excessive memory consumption, or a leak? (Or something else?) What kind of performance problem? How did Scalene help? Your stories will help guide the development of new features, and also brighten my day!
The text was updated successfully, but these errors were encountered: