Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMPAT/TEST test, fix for unsafe Vector.resize(), which allows refche… #16258

Merged
merged 3 commits into from
May 11, 2017

Conversation

mattip
Copy link
Contributor

@mattip mattip commented May 5, 2017

closes issue #15854, supersedes pull request #16224, #16193

Adds a test showing how the uniques attribute leaks to user space, and calling get_labels() again with different data could change the underlying ndarray. With this pull request an exception will be raised after calling append() after calling to_array(), which makes the test pass. It also allows addition of the refcheck=False kwarg to ndarray.resize(), which fixes the issue above.

@codecov
Copy link

codecov bot commented May 6, 2017

Codecov Report

Merging #16258 into master will decrease coverage by 0.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16258      +/-   ##
==========================================
- Coverage   90.32%    90.3%   -0.02%     
==========================================
  Files         167      167              
  Lines       50907    50907              
==========================================
- Hits        45982    45973       -9     
- Misses       4925     4934       +9
Flag Coverage Δ
#multiple 88.09% <ø> (ø) ⬆️
#single 40.29% <ø> (-0.11%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/core/frame.py 97.58% <0%> (-0.1%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ba60321...d8be715. Read the comment docs.

@codecov
Copy link

codecov bot commented May 6, 2017

Codecov Report

Merging #16258 into master will increase coverage by <.01%.
The diff coverage is n/a.

Impacted file tree graph

@@            Coverage Diff             @@
##           master   #16258      +/-   ##
==========================================
+ Coverage   90.34%   90.35%   +<.01%     
==========================================
  Files         167      161       -6     
  Lines       50908    50863      -45     
==========================================
- Hits        45994    45957      -37     
+ Misses       4914     4906       -8
Flag Coverage Δ
#multiple 88.13% <ø> (+0.02%) ⬆️
#single 40.33% <ø> (-0.07%) ⬇️
Impacted Files Coverage Δ
pandas/io/gbq.py 25% <0%> (-58.34%) ⬇️
pandas/io/msgpack/_version.py 44.86% <0%> (-55.14%) ⬇️
pandas/core/util/hashing.py 92.03% <0%> (-7.02%) ⬇️
pandas/core/computation/expressions.py 90.47% <0%> (-2.21%) ⬇️
pandas/plotting/_converter.py 63.23% <0%> (-1.82%) ⬇️
pandas/core/frame.py 97.59% <0%> (-0.1%) ⬇️
pandas/util/__init__.py
pandas/formats/style.py
pandas/io/msgpack/__init__.py
pandas/io/json/__init__.py
... and 20 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea56550...247cc75. Read the comment docs.

@jreback jreback added the Compat pandas objects compatability with Numpy or Python functions label May 6, 2017
@mattip
Copy link
Contributor Author

mattip commented May 7, 2017

Sorry it took so many builds to get tests to pass, I am still learning the build system

@@ -52,6 +52,7 @@ cdef struct Int64VectorData:
cdef class Int64Vector:
cdef Int64VectorData *data
cdef ndarray ao
cdef bint external_view_exists
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are you only doing this for Int64 for some reason?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the other Vector classes are defined in pandas/_libs/hashtable_class_helper.pxi.in, for some reason the original code defines this class here specificially. So I added the cdef bint external_view_exists where the other classes are defined

@mattip
Copy link
Contributor Author

mattip commented May 9, 2017

@chris-b1 any opinions?

Copy link
Contributor

@chris-b1 chris-b1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks reasonable to me

Copy link
Contributor

@jreback jreback left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks fine. minor comments. thank you.

self.ao.resize(self.data.n)
if self.data.m != self.data.n:
if self.external_view_exists:
raise ValueError("should have raised on append(), m=%d n=%d" % (self.data.m, self.data.n))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use .format() syntax and wrap the line

# to_array resizes the vector
uniques.to_array()
htable.get_labels(vals, uniques, 0, -1)
# to_array may resize the vector
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you put a 1-line comment here on wheat you are checking

@jreback jreback modified the milestones: 0.20.2, 0.21.0 May 9, 2017
@mattip
Copy link
Contributor Author

mattip commented May 10, 2017

@jreback rather than use format() in the error message, I simply shortened it. Thanks for the review

uniques = ObjectVector()
data = self.uniques.to_array()
for v in data:
uniques.append(v)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for consistency maybe we should add .extend to String/Object Vectors. can you add?

@jreback
Copy link
Contributor

jreback commented May 11, 2017

thanks!

pcluo pushed a commit to pcluo/pandas that referenced this pull request May 22, 2017
pandas-dev#16258)

* COMPAT/TEST test, fix for unsafe Vector.resize(), which allows refcheck=False

* COMPAT/TEST improve error msg, document test as per review

* COMPAT/TEST unify interfaces as per review
stangirala pushed a commit to stangirala/pandas that referenced this pull request Jun 11, 2017
pandas-dev#16258)

* COMPAT/TEST test, fix for unsafe Vector.resize(), which allows refcheck=False

* COMPAT/TEST improve error msg, document test as per review

* COMPAT/TEST unify interfaces as per review
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Compat pandas objects compatability with Numpy or Python functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants