Skip to content

Commit

Permalink
Keep rval refs alive in StringHashTable._unique
Browse files Browse the repository at this point in the history
Address a Heisenbug caused by `v = get_c_string(<str>repr(val))` potentially pointed to a string that is unreferenced the next time an exception is raised.  (Two exceptions are raised in succession in `pandas/tests/base/test_unique.py test_unique_bad_unicode`.

Signed-off-by: Michael Tiemann <[email protected]>
  • Loading branch information
MichaelTiemannOSC committed Oct 15, 2023
1 parent d98e6f0 commit dabaf6f
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion pandas/_libs/hashtable_class_helper.pxi.in
Original file line number Diff line number Diff line change
Expand Up @@ -1128,6 +1128,7 @@ cdef class StringHashTable(HashTable):
use_na_value = na_value is not None

# assign pointers and pre-filter out missing (if ignore_na)
keep_rval_refs = []
vecs = <const char **>malloc(n * sizeof(char *))
for i in range(n):
val = values[i]
Expand All @@ -1144,7 +1145,9 @@ cdef class StringHashTable(HashTable):
try:
v = get_c_string(<str>val)
except UnicodeEncodeError:
v = get_c_string(<str>repr(val))
rval = <str>repr(val)
keep_rval_refs.append(rval)
v = get_c_string(rval)
vecs[i] = v

# compute
Expand Down

0 comments on commit dabaf6f

Please sign in to comment.