Explore if using `cudf.pandas` provides any acceleration #1185

matt-graham · 2023-11-10T11:31:09Z

cudf.pandas claims to be a drop-in replacement for pandas with support for GPU acceleration (on NVIDIA GPUs) and support for "100% of the pandas API".

While I suspect we wouldn't get anywhere near the speedups illustrated on their benchmarks, it may be worth investigating if using cudf.pandas provides any performance advantage for TLOmodel simulations on systems with an NVIDIA GPU. The cudf.pandas module provides an install function for monkey-patching existing pandas import or there are command-line options and an IPython extension for doing the same, so technically this shouldn't require any changes on the TLOmodel side. From a very brief attempt at running this on a Google Colab instance, it seems the claim of 100% API compatibility is not accurate as we get an error at

[/usr/local/lib/python3.10/dist-packages/tlo/methods/healthburden.py](https://localhost:8080/#) in read_parameters(self, data_folder)
     68         p['DALY_Weight_Database'] = pd.read_csv(Path(self.resourcefilepath) / 'ResourceFile_DALY_Weights.csv')
     69         p['Age_Limit_For_YLL'] = 70.0  # Assumption that only deaths younger than 70y incur years of lost life
---> 70         p['gbd_causes_of_disability'] = set(pd.read_csv(
     71             Path(self.resourcefilepath) / 'gbd' / 'ResourceFile_CausesOfDALYS_GBD2019.csv', header=None)[0].values)
     72

when trying to run a simulation with fullmodel, which appears to be due to the accessing the dataframe column using an integer 0 index rather than string "0", despite the former working in standard Pandas (though I suspect the latter is probably the recommended as generally column names are strings). If it's just relatively minor differences like this it would probably not be a massive amount of work to try to get this working, but hard to tell without investigating further.

The text was updated successfully, but these errors were encountered:

beckernick · 2023-11-20T23:06:24Z

Hi @matt-graham ! I came across this issue due to the cudf.pandas reference (I work on this and other RAPIDS projects). Glad to see you're interested in cudf.pandas.

It looks like this error is coming from this cuDF issue. It's definitely a bug. We'll explore what solving it might look like.

In the meantime, a potential workaround might be to temporarily switch this line to instead grab the first column with something like .iloc[:, 0] that doesn't rely on the column name (since the file has no header anyway). Would love to see if cudf.pandas can provide a speedup here!

matt-graham · 2023-11-21T12:33:04Z

Hi @beckernick! Thanks for the pointer to the issue and for the suggested workaround, will have a look at implementing this and seeing if we hit against any other problems.

I just noticed that the cudf.pandas docs indicate that currently compatibility with pandas 1.5.x is being targetted and there is an issue connected to adding pandas 2.0 support at rapidsai/cudf#12794 - as we're requiring pandas 2.0 or above here we may need to also wait for that to be resolved.

matt-graham added question Further information is requested performance labels Nov 10, 2023

matt-graham self-assigned this Nov 13, 2023

github-project-automation bot added this to Issue management Aug 27, 2024

github-project-automation bot moved this to Issues in Issue management Aug 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Explore if using `cudf.pandas` provides any acceleration #1185

Explore if using `cudf.pandas` provides any acceleration #1185

matt-graham commented Nov 10, 2023

beckernick commented Nov 20, 2023

matt-graham commented Nov 21, 2023

Explore if using cudf.pandas provides any acceleration #1185

Explore if using cudf.pandas provides any acceleration #1185

Comments

matt-graham commented Nov 10, 2023

beckernick commented Nov 20, 2023

matt-graham commented Nov 21, 2023

Explore if using `cudf.pandas` provides any acceleration #1185

Explore if using `cudf.pandas` provides any acceleration #1185