-
-
Notifications
You must be signed in to change notification settings - Fork 0
Working Full Time on NumPy
I have been hired by the BIDS to work on NumPy full-time. Here I will document my work, what went well and also what didn't.
I am at EuroScipy in Trento, Italy. What a beautiful location for a conference, in a valley surrounded by mountains, the city itself has a wonderful square and castle. I presented a tutorial on wrapping C libraries with python, as well as a 15 minute talk on NumPy and where we would like to take it over the next two years. I am meeting lots of people from all across Europe, in total there are about 250 people here.
Whoops, over a month has gone by. The past month has been busy. July began with SciPy 2018. We presented a 5-minute Numpy overview which I am turning into a 20 minute talk for EuroScipy at the end of August. We also opened a BOF session, mainly disussing the NumPy roadmap, later when back at BIDS it was integrated with the NEPS documents. And we sprinted for two days. The most productive part of the conference for me was the opportunity to meet face-to-face with the core devs past and present. We were able to put together an impromptu dtypes brainstorming session, which it seems will become the focus of my work.
I took some time off at the end of the month as well. It seems the August doldrums have hit, some of the pull requests are languishing in the queue. On the other hand, 1.15 went out with not too much complaints for so many deep changes. Now I am trying to understand what must be done in creating a dtype, which is alot of reading.
I have been staring at the code to reproduce the inner functions of np.dot
as np.matmul
. Internally, NumPy today uses einsum
to implement matmul. When implementing matmul
as a gufunc, it would seem to be more appropriate to reuse the dot
functionality since einsum
is more general, dot
has a fixed signature that does not have to be reparsed for each call.
dot
actually calls PyArray_MatrixProduct2
to do the work. That function quickly hands off the work to cblas_matrixproduct
if possible (dtype is float, double or cfloat, cdouble and blas is available), and if not calculates the result itself. PyArray_MatrixProduct2
and `cblas_matrixproduct share little code between them, here is a comparison:
step | PyArray_MatrixProduct2 |
cblas_matrixproduct |
---|---|---|
detect non-aligned or non-contiguous data | doesn't care | uses PyArray_NewCopy if non-aligned or non-contiguous |
create output array | use new_array_for_sum | refactored lines of code to new_array_for_sum in PR #11432 |
logic to handle vector-matrix, matrix-matrix, scalar-matrix and visa-versa | uses PyArray_IterAllButAxis
|
if blocks for each case |
chooses function for sum-of-loops | uses out->dtype->vdot
|
if blocks for each of the level 2 BLAS cases, use out->dtype->vdot for vector/scalar (which will use level 1 BLAS dot functions) |
I then did some more work with the loops in the matmul PR.
Still plugging away at matmul as a ufunc, I discovered a few spots where the out
kwarg was not handled properly in einsum
. It seems out
is rarely used in practice, as we apparently have no reported issues with it. Too bad, it is possibly a nice way to conserve memory. I also refactored one of my pre-BIDS pull requests, simplifying NpyIter
deallocation, based on a last-minute review of the upcoming 1.15 release.
We now have four draft NEPS in the pipeline, it seems NumPy still has room to grow and develop. These NEPS touch on the major pain points in NumPy: indexing, random numbers, allowing downstream packages to replace the local-memory-based-backed storage mechanism, and extending generic ufuncs. Still in the works - extending the dtype mechanism, something I hope to work on in the coming months.
The matmul as a ufunc pull request is starting to come together. I created loops for all the supported types, and am now trying to optimise. So before linking the implementation of matmul
to that of dot
, I set up a little comparison of matmul (as a ufunc) dot, and einsum.
This week there was a large group of people at BIDS sprinting on a number of packages central to the data science core. It was nice to meet them and hear how they view NumPy within their projects. Spent most of the week triaging, formalizing a few NEPs, and pushing the ufunc signature expansion.
We had a very nice two-day NumPy sprint at BIDS, with about 8 participants. Among the highlights:
- We drafted a very rough proposal for a NumPy "roadmap", which tries to sketch out the scope of the NumPy project, what are the enhancements we would like to see (including soft tagets like "a larger dev community"), and what tasks the maintainers should be doing. We will post the draft for comment soon, and hope to discuss it at the NumPy BOF at SciPy2018 in Austin TX.
- We made progress toward understanding what a duck-array is, and how it might be implemented. Stay tuned for the NEP proposal and a prototype implementation.
- We hammered out some issues better solved with many eyes, for instance it seems the CI is now pulling in pytest-3.6 which seems to have exposed an issue in one of the refcount tests.
- We seem to have reached a concenus, based on the mailing list and github responses, prefering making
matmul
a full ufunc rather than creating a ufunc_wrapper decorator function. - No-one opposed merging umath and multiarray modules into one, so I will continue to work on that in the coming weeks.
- Social interaction between people who rarely meet.
- Progress toward releasing v1.15.0
Thanks to all who made the effort!
I am now located at BIDS for the summer, it took a little while to transition and settle in. In the meantime, I have completed a prototype matmul
implementation as a ufunc
. So now there are two competing ideas: either
- wrap
matmul
with awrapped_ufunc
decorator that allows the__array_ufunc__
mechanism to delegate tondarray
subclasses, and/or - expand the core signature of
gufunc
s to allow?
, breakPyUFuncObject
binary compatibility, and convertmatmul
to a trueufunc
.
Which do you prefer?
Started digging in to the history of the matmul
and friends. The @
operator is routed through array_matrix_multiply
to np.matmul
, which is implemented as array_matmul
which uses PyArray_EinsteinSum
for true matrix multiplication or cblas_matrixproduct
for inner product.
However np.dot
is routed to array_dot
which uses PyArray_MatrixProduct2
, which also tries to use cblas_matrixproduct
. Rather than using PyArray_EinsteinSum
it loops over dtype->funcdot
. It seems we should try to unify these different interfaces under the ufunc
umbrella somehow.
Also, PyArray_EinsteinSum
itself (exposed to python as np.c_einsum
, partially reimplemented in python as np.einsum
) seems to reimplement something resembling ufunc. Einsum has its own signature that uses a different syntax from ufuncs. Maybe the whole einsum mechanism should be overhauled to use ufuncs?
Searching for einsum
and ufunc
in issues lead me to a few discussions
- using ufuncs everywhere, which is very similar to this summary
- deprecating dot which turned out to be controversial.
Going even further down this rabbit hole, it seems
-
einsum
is found in 28 open issues/pull requests -
dot
is found in 71 open issues/pull requests
Some of these are performance, some bugs, some documentation.
An idea - maybe we can triage issues by using a word cloud to find out what are the major numpy pain points?
$pip install wordcloud, pygithub
$python
>>> import github, matplotlib.plot as plt
>>> from wordcloud import WordCloud, STOPWORDS
>>> gh = github.GitHub("access_token")
>>> repo = gh.get_organization('numpy').get_repo('numpy')
>>> issues = repo.get_issues(state='open')
>>> txts = [xx.body for xx in issues] # slow, actually queries GH for each issue
>>> stopwords = set(STOPWORDS)
>>> stopwords.update(('Oj', '-Oj', 'return', 'python', 'numpy', 'py', 'line', 'package', 'use', 'using', 'array', 'np'))
>>> wc = wordcloud.WordCloud(background_color="white", stopwords=stopwords).generate(' '.join([xx for xx in txts if xx]))
>>> plt.imshow(wc, interpolation='bilinear'); plt.show()
hmm, not very helpful. Maybe someone can improve on my word clould skills?
Went back to making matmul
a full-fledged ufunc. It turns out the ufuncs used in ndarray are generated by a python script generate_umath.py
, that cannot accept a signature. The ones in linalg
are generated in C in umath_linalg.c.src
, it is a toss-up which is more clear. There is also a test implementation of matrix_multiply in _umath_tests.c.src
. So far I have extended generate_umath.py
to accept a signature, and created a matrix multiply ufunc that should replace matmul
, but I haven't worked out how to hook it in to be used. I think I need to extend PyArray_SetNumericOps
.
Still working through the two issues from May 4. I have learned how hard it is to create a decorator class in C, compared to the 4 lines in python. The main problem is how to move between a c function pointer to a PyMethodObject. I ended up writing the actual decorator in python, and calling that from C. It seems to work, at least enough to get some feedback whether this is the direction the solution will take.
The SSE optimization work around now is green on the buildbots, after much back and forth between the different compiler quirks.
In addition to triaging and cleaning up other pull requests, I have been spending the last few days around two issues:
- creating a decorator class in C that can wrap a "ufunc-like" class method (looking at you,
matmul
) and apply the__array_ufunc__
resolution machinery, which is what I described below - working through a solution to optimizer compiler's reordering of operations involving SSE calls, issue 10370. It took quite a lot of digging to come up with a "solution" that hopefully works on all compilers, at least it works on gcc and clang. It turns out using floating point error handling is non-trivial.
As an alternative to modifying ufunc signatures, there is a suggestion to try to wrap the __array_ufunc__
override mechanism in a function decorator. So today I worked through what that means. In each of the methods of numpy.ufunc
in umath/ufunc_object.c
(__call__
, reduce
, accumulate
, reduceat
, outer
, and at
) there is a call to PyUFunc_CheckOverride
. If it returns a value, this is called instead of the ufunc. OK, so maybe if I could create a ufunc object that wraps a python function, my work could be done. So I tried to make a wrapper out of numpy.frompyfunc
.
Along the way I noticed its implementation is a copy-paste from somewhere else, so I refactored out the duplicated code in favor of calling PyUFunc_FromFuncAndData
.
Unfortunately, this path is not applicable since frompyfunc
does not accept a signature and matmul
definitely needs a signature, the default is (n),(n)->(n)
. Even if we extended frompyfunc
to accept a signature, we would be back where we started, that is needing to extend gufunc for a number of signatures.
Hmm, I could create another class, much like numpy.ufunc:
def __init__(self, func):
self.func = func
def __call__(self, *args, **kwargs):
# the c call is a bit different
result = PyUFunc_CheckOverride(self.func, "__call__", args, kwargs)
if result:
return result
return self.func(*args, **kwargs)
Then perhaps this class would be suitable as a wrapper ...
Diving deep into parsing generalized ufunc signatures. I think I have to add a new field to the PyUFuncObject
, I spent some time considering adding options to existing fields (like making them negative to represent flexible dimensions). The general idea is to allow signatures like (n?, k), (k, m?) -> (n?, m?)
, the question mark means "it is OK if that dimension is missing in the input/output". This would enable matmul
to become a generic ufunc, handling (k),(k,m)->(m)
, (n,k),(k,m)->(n,m)
, (k),(k)->()
and (n,k),(k)->(n)
operations (vecmat
, matmat
, vecvec
, matvec
respectively). I am also completing an old pull request for fixed dimensions, like (3),(3)->()
which would allow func(shape(4,3), shape(3,)) -> shape(4)
ufuncs. What would (3?), (3?) -> ()
mean? In the end I rejected this, let's keep it simple. Progress can be viewed here https://github.com/mattip/numpy/tree/expand-gufunc-signature
I also took a look at the numpydoc
repo which has a number of issues and open pull requests, and also spent some time with the wonderful tool to show compiled assembler mentioned in issue #10370
Jupyter has a nice repo for long term planning, perhaps we should do something like that. Continued working on the nep 0015 refactor, triaging, and tweaking documentation. The nep refactor now achieves the major aims, now need to think about further refactoring _import_array
and _import_umath`.
Looking at the matmul and array_ufunc issue, it seems there was some work done in PR #5015 that might be applicable. I rebased that to master, and am thinking how the extra data info without changing the size of PyUFuncObject.
We tried an "Office Hours" meeting, about 7 people showed up for a one hour conversation that ranged from details about how to close duplicate issues on the issue tracker to how to manage the project and how to create a roadmap for NumPy (the one on this wiki is just for my work). I am working on a summary that will be sent to the NumPy dev mailing list.
I looked more at the implement-nep-0015 branch, trying to understand why it segfaults on CPython3. I found the best way to debug was to run CFLAGS='-O0 -g' pip install -e . --upgrade
in a virtualenv, then gdb --args python -c "import numpy as np"
. When things are a bit further along, CFLAGS='-O0 -g' python runtests.py -t file_that_segfaults
is also effective.
Along the way, I tried diving into the ufunc mechanism, in preperation for tackling matmul and ufuncs.
It is annoying that sometimes when running runtests.py
, the tests all pass but then CircleCI fails since there was a warning during the build_ext
compilation. I took a look at using CFLAGS=-Werror
but there are many warnings during the configuration stage. I fixed some, by making the distutils.command.config tool even more complicated, it seems it will turn into a new autoconfig
utility so I will try a different approach.
The doc changes were merged, now visible at devdocs. Also the link on scipy will be updated to point to that link soon. I am now following the https://github.com/numpy/numpy firehose of changes to the repo (pull requests, issues). It is awe-inspiring to see the level of dedication from the volunteers. I have started sifting through old pull requests and issues, trying to find low-hanging fruit.
In a conversation about the new quantile function that close to ready for merging, I peeked at the rendering of the percentile function and noticed a few formatting problems, which sent me down the documentation site rabbit hole again. A new pull request to fix some quirks in rendering and reduce the number of sphinx warnings was born. There are many warnings about "document isn't included in any toctree", it seems something is not quite right with the autoclass template even though it is the same one used by scipy.
While the discussion about matmul
is progressing on the issue tracker, I made some other progress with issue/pr triaging, finished the documentation PR, and started merging multiarray and umath (nep 0015) While the current pull request passes tests on python2, it crashes in python3. I suspect there are subtle issues with module initialization and the new import mechanism, but it is hard to pin down.
Next up - issue #9028 about overriding matmul
if __array_ufunc__
is set. In trying to understand the issue, I got sidetracked reading the relevant NEPS but then I went down the rabbit hole of looking at the developer docs. It turns out the docs themselves had a few problems: outdated links, relevant info was buried or not even linked into the docs, and way too many warnings when building the docs using sphinx
. This led me to spend some time updating the docs themselves.
Along the way I started learning about the larger NumPy github workflow - how to label and curate issues and pull requests, as an ongoing task I can do to reduce the core devs' workload.
Now I can get back to reading NEPS and trying to figure out what makes matmul
special.
Trying to take a look at the 1.15 release milestone. Touched a few things, then came across a long-standing issue with complex numbers, negative exponents, and savetxt
/ loadtxt
. Should be an easy place to start sinking my teeth in.
I began onboarding, thinking about how I can best help the community and the project. Basically a day of getting organized. Had a productive meeting with Stefan van der Walt from BIDS, where we laid out some guidelines for my first steps.
Thanks to the Moore and Sloan Foundations and BIDS for allowing me this opportunity