Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is it feasible to remove scipy and numpy as dependencies? #130

Open
ormsbee opened this issue Dec 4, 2024 · 3 comments
Open

Is it feasible to remove scipy and numpy as dependencies? #130

ormsbee opened this issue Dec 4, 2024 · 3 comments
Assignees

Comments

@ormsbee
Copy link

ormsbee commented Dec 4, 2024

These dependencies are heavyweight, complicating installation and increasing memory usage. But it doesn't look like we actually use scipy anywhere, and my brief reading of this repo's numpy usage is that we could replace it with calls to Python's stdlib math and cmath modules (which were not available back when this repo was created using Python 2.x).

@Colin-Fredericks
Copy link
Contributor

If we don't use scipy, then yes definitely let's get rid of it.

With numpy, since it's actually in use I'd want to do some very, very thorough testing to make sure that the math (or cmath) libraries produce exactly the same results even in weird cases. I know numpy also has its own variable types, but I don't know the precise differences or potential casting issues.

As an example: python's built-in exponentiation says that float(10**20) = 1e+20 (which is correct). Numpy's power() function disagrees: it says float(numpy.power(10, 20)) = 7.766279631452242e+18 (which is extremely not correct). I want to see if this causes problems with things like statistical mechanics, where numbers like 6.02e+23 are pretty common. I realize that numpy is actually the one that's incorrect here (I am as surprised as you are), but sometimes it might be the other way around.

Since numpy is also one of the importable libraries in custom python-based problems, it might also be worth checking with MIT's folks to see if they import numpy for any of theirs.

@ormsbee
Copy link
Author

ormsbee commented Dec 5, 2024

I realize that numpy is actually the one that's incorrect here (I am as surprised as you are), but sometimes it might be the other way around.

FWIW, I don't think it's surprising. Numpy is optimized for speed and memory over correctness. The reason it gets 10^20 wrong is because it uses a 64 bit int to store the data, and that overflows. Doing numpy.power(10.0, 20) forces it to use floating point, and does what you'd expect:

>>> numpy.power(10, 20)
7766279631452241920
>>> numpy.power(10.0, 20)
1e+20

I had a similar concern with score percentage calculation in edx-platform. Under Python 2.7, Python's built-in round() function would break ties by rounding away from zero. The team working on grading opted to round to the nearest even value to break ties, so they used numpy's rounding. In Python 3.x, Python changed its built-in round() to also round to the nearest even value for ties, so I thought that might also be a good candidate for replacement. Then I read the following from numpy's documentation:

np.round uses a fast but sometimes inexact algorithm to round floating-point datatypes... Alternatively, Python’s builtin round function uses a more accurate but slower algorithm for 64-bit floating point values.

I realize that any change in scoring behavior is a potential headache, even if it is technically more correct.

Since numpy is also one of the importable libraries in custom python-based problems, it might also be worth checking with MIT's folks to see if they import numpy for any of theirs.

I hadn't thought of removing numpy from the platform outright, because of the content backwards compatibility issues you point out. But I'm only just now realizing that we pass calc itself as one of the things that customresponse problems have access to, so this would affect the behavior of that content as well. (I was primarily thinking of openedx-calc as part of the calculator course tool.)

@Colin-Fredericks
Copy link
Contributor

Opened a very small pull request to remove scipy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants