[PERF/ENH] `Series.map` sorts a larger dataset than it needs to #14485

wence- · 2023-11-23T15:12:00Z

Series.map which substitutes values in self that match some key with its corresponding value does:

            lhs = cudf.DataFrame({"x": self, "orig_order": arange(len(self))})
            rhs = cudf.DataFrame(
                {
                    "x": arg.keys(),
                    "s": arg.values(),
                    "bool": full(len(arg), True, dtype=self.dtype),
                }
            )
            res = lhs.merge(rhs, on="x", how="left").sort_values(
                by="orig_order"
            )
            result = res["s"]
            result.name = self.name
            result.index = self.index

So the order is the same as the input.

This has two pessimisations:

In pandas-compat mode (since Match pandas join ordering obligations in pandas-compatible mode #14428) this merge doesn't need sorting
Since we only return s, we can get away with sort_by_key of res["s"] rather than sorting a multi-column dataframe

The text was updated successfully, but these errors were encountered:

wence- · 2023-11-23T16:20:52Z

cross-ref #6724

wence- mentioned this issue Nov 23, 2023

[ENH] Audit cudf APIs for use of inappropriate algorithms #14479

Open

wence- added Python Affects Python cuDF API. Performance Performance related issue no-oom Reducing memory footprint of cudf algorithms labels Nov 23, 2023

vyasr added this to cuDF Python Nov 5, 2024

github-project-automation bot moved this to Todo in cuDF Python Nov 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PERF/ENH] `Series.map` sorts a larger dataset than it needs to #14485

[PERF/ENH] `Series.map` sorts a larger dataset than it needs to #14485

wence- commented Nov 23, 2023

wence- commented Nov 23, 2023

[PERF/ENH] Series.map sorts a larger dataset than it needs to #14485

[PERF/ENH] Series.map sorts a larger dataset than it needs to #14485

Comments

wence- commented Nov 23, 2023

wence- commented Nov 23, 2023

[PERF/ENH] `Series.map` sorts a larger dataset than it needs to #14485

[PERF/ENH] `Series.map` sorts a larger dataset than it needs to #14485