scanpy_funcs: Use multi-target regression #104

ahendriksen · 2022-11-17T14:19:28Z

cuML will gain support for multi-target regression in version 22.12. This speeds up the code significantly since the regression no longer has to be performed inside a for loop. As described in rapidsai/cuml#4988, the speedup can be up to 50x.

This PR adds support for using the multi-target regression. To prevent running out of memory when the input is too large (more than 100000 rows), the existing iterative code path is used.

cuML gained support for multi-target regression in version 22.12. This removes the need for a Python for loop and speeds up the code significantly. This commit adds support for using the multi-target regression. To prevent running out of memory when the input is too large (more than 100000 rows), the existing iterative code path is used.

ahendriksen · 2022-11-17T14:33:57Z

For a representative use case, this PR speeds up the regression step by a factor of ~20.

import cupy as cp
from cuml.linear_model import LinearRegression
from time import perf_counter as timer
from contextlib import contextmanager

@contextmanager
def bench(name):
    # Code to acquire resource, e.g.:
    start = timer()
    yield
    duration = timer() - start
    print(f"{name}: {duration:0.2f} seconds")

n_cells = 91_000
n_genes = 5_000

normalized = cp.random.normal(size=(n_cells, n_genes))
n_counts = cp.random.normal(size=n_cells)
percent_mito =  cp.random.normal(size=n_cells)

with bench("current regress_out"):
    regress_out(normalized, n_counts, percent_mito, verbose=True) # current

with bench("new regress_out"):
    regress_out(normalized, n_counts, percent_mito, verbose=True) # PR

Results on a Volta V100 GPU:

Regressed 0 out of 5000
[... snip ...]
Regressed 4500 out of 5000

current regress_out: 16.75 seconds
new regress_out:      0.73 seconds

Intron7 · 2022-11-30T13:36:12Z

Dear Allard,
great work this works amazingly well.
Do you think it would also work if we created a chunck_size parameter to run like 100 genes at a time even for lager than 100k datasets?

cjnolet

@ahendriksen this repository hasn't been updated for RAPIDS 22.12 yet. Can you update the environment yaml files in conda/environment/*.yaml and the Dockerfile and verify the notebooks still run successfully?

Also, @Intron7 has a valid point- it would be great to still maintain some level of chunking even w/ the multiple outputs support.

Intron7 · 2022-11-30T16:09:44Z

So I implement this chunking already for my repo rapids_singlecell and it works really well. I'm still figuring out the details of the syntax, because its still pretty clunky
I could write something similar for @ahendriksen PR.

cjnolet · 2022-12-19T18:27:19Z

I could write something similar for @ahendriksen PR.

@Intron7 if you have time to do that, it would certainly be really useful to show off.

Intron7 · 2022-12-19T19:07:26Z

@cjnolet @ahendriksen done.

I created a PR for @ahendriksen branch with the batching update.

I have to test tomorrow if we have to change the code for the multigpu notebook to set the batchsize argument there to None. The default is now a batchsize of 100. But i can change that to whatever you want.

still need to test if this works with multi-GPU in the batched format.

ahendriksen · 2023-01-10T21:19:10Z

Thank you for adding the batching update @Intron7!

@ahendriksen this repository hasn't been updated for RAPIDS 22.12 yet. Can you update the environment yaml files in conda/environment/*.yaml and the Dockerfile and verify the notebooks still run successfully?

I have tested that the code / notebook in this PR runs successfully on RAPIDS 22.12. I have not tested if all other notebooks continue to run successfully.

Intron7 · 2023-01-10T21:41:33Z

@ahendriksen @cjnolet Perfect. I think this in a hugh improvement since this was one of the slowest parts of the analysis

cjnolet · 2023-01-10T21:45:00Z

I have to test tomorrow if we have to change the code for the multigpu notebook to set the batchsize argument there to None. The default is now a batchsize of 100. But i can change that to whatever you want.

@Intron7 were you able to check this? Does it run successfully for you? If so, I'm okay merging this and I can update the conda yaml file in a follow-on PR.

Intron7 · 2023-01-10T21:58:26Z

I found one error within the UVM 1million notebook. Its a memory error with cupy and the cubin hash in the scale function. The standard scaler function works but is a lot slower.

---------------------------------------------------------------------------
CUDADriverError                           Traceback (most recent call last)
File <timed exec>:1

File cupy/_core/core.pyx:1112, in cupy._core.core._ndarray_base.mean()

File cupy/_core/core.pyx:1121, in cupy._core.core._ndarray_base.mean()

File cupy/_core/_routines_statistics.pyx:148, in cupy._core._routines_statistics._ndarray_mean()

File cupy/_core/_kernel.pyx:1310, in cupy._core._kernel.ufunc.__call__()

File cupy/_core/_kernel.pyx:1337, in cupy._core._kernel.ufunc._get_ufunc_kernel()

File cupy/_core/_kernel.pyx:1029, in cupy._core._kernel._get_ufunc_kernel()

File cupy/_core/_kernel.pyx:72, in cupy._core._kernel._get_simple_elementwise_kernel()

File cupy/_core/core.pyx:2232, in cupy._core.core.compile_with_cache()

File ~/conda/envs/rapids-22.12/lib/python3.9/site-packages/cupy/cuda/compiler.py:493, in _compile_module_with_cache(source, options, arch, cache_dir, extra_source, backend, enable_cooperative_groups, name_expressions, log_stream, jitify)
    489     return _compile_with_cache_hip(
    490         source, options, arch, cache_dir, extra_source, backend,
    491         name_expressions, log_stream, cache_in_memory)
    492 else:
--> 493     return _compile_with_cache_cuda(
    494         source, options, arch, cache_dir, extra_source, backend,
    495         enable_cooperative_groups, name_expressions, log_stream,
    496         cache_in_memory, jitify)

File ~/conda/envs/rapids-22.12/lib/python3.9/site-packages/cupy/cuda/compiler.py:562, in _compile_with_cache_cuda(source, options, arch, cache_dir, extra_source, backend, enable_cooperative_groups, name_expressions, log_stream, cache_in_memory, jitify)
    560             cubin_hash = _hash_hexdigest(cubin).encode('ascii')
    561             if hash == cubin_hash:
--> 562                 mod.load(cubin)
    563                 return mod
    564 else:
    565     # Enforce compiling -- the resulting kernel will be cached elsewhere,
    566     # so we do nothing

File cupy/cuda/function.pyx:264, in cupy.cuda.function.Module.load()

File cupy/cuda/function.pyx:266, in cupy.cuda.function.Module.load()

File cupy_backends/cuda/api/driver.pyx:210, in cupy_backends.cuda.api.driver.moduleLoadData()

File cupy_backends/cuda/api/driver.pyx:60, in cupy_backends.cuda.api.driver.check_status()

CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered

Intron7 · 2023-01-10T22:36:55Z

@cjnolet The Multi_GPU notebook runs perfectly fine with the default of running 100 genes at once during regress_out.

cjnolet · 2023-01-10T22:43:07Z

@Intron7 what hardware did you use to test the UVM notebook? I've found that illegal memory error can commonly happen when 1) precision overflow for an indexing type (like using an int for nnz in sparse data) or 2) when the memory is oversubscribed by too much.

Intron7 · 2023-01-10T22:46:10Z

@cjnolet I ran it on a node with an 80GB A100 PCIe and on one of the 80GB A100s in the dgx both the same error.

cjnolet · 2023-01-10T23:00:06Z

Thanks for verifying that, @Intron7! I'll try and take a little deeper look as well

Intron7 · 2023-01-10T23:19:26Z

Thanks for verifying that, @Intron7! I'll try and take a little deeper look as well

It could also be our badly setup GPUs. So if this notebooks work on your end I think its us.

…ooks

ahendriksen force-pushed the fea-multi-target-regression branch from 99eef26 to 3780943 Compare November 17, 2022 14:31

cjnolet requested changes Nov 30, 2022

View reviewed changes

added batching

9b57688

still need to test if this works with multi-GPU in the batched format.

ahendriksen mentioned this pull request Jan 10, 2023

Added batching ahendriksen/rapids-single-cell-examples#1

Merged

cjnolet added 3 commits January 24, 2023 10:01

Updating conda environment files for RAPIDS 22.12 and rerunning noteb…

d9894df

…ooks

Changed notebook

8d68705

Merge branch 'master' into fea-multi-target-regression

a6f2cd2

cjnolet approved these changes Jan 24, 2023

View reviewed changes

cjnolet merged commit 5ba1227 into NVIDIA-Genomics-Research:master Jan 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scanpy_funcs: Use multi-target regression #104

scanpy_funcs: Use multi-target regression #104

ahendriksen commented Nov 17, 2022

ahendriksen commented Nov 17, 2022

Intron7 commented Nov 30, 2022 •

edited

Loading

cjnolet left a comment •

edited

Loading

Intron7 commented Nov 30, 2022

cjnolet commented Dec 19, 2022

Intron7 commented Dec 19, 2022 •

edited

Loading

ahendriksen commented Jan 10, 2023

Intron7 commented Jan 10, 2023

cjnolet commented Jan 10, 2023

Intron7 commented Jan 10, 2023 •

edited

Loading

Intron7 commented Jan 10, 2023

cjnolet commented Jan 10, 2023

Intron7 commented Jan 10, 2023

cjnolet commented Jan 10, 2023

Intron7 commented Jan 10, 2023

scanpy_funcs: Use multi-target regression #104

scanpy_funcs: Use multi-target regression #104

Conversation

ahendriksen commented Nov 17, 2022

ahendriksen commented Nov 17, 2022

Intron7 commented Nov 30, 2022 • edited Loading

cjnolet left a comment • edited Loading

Choose a reason for hiding this comment

Intron7 commented Nov 30, 2022

cjnolet commented Dec 19, 2022

Intron7 commented Dec 19, 2022 • edited Loading

ahendriksen commented Jan 10, 2023

Intron7 commented Jan 10, 2023

cjnolet commented Jan 10, 2023

Intron7 commented Jan 10, 2023 • edited Loading

Intron7 commented Jan 10, 2023

cjnolet commented Jan 10, 2023

Intron7 commented Jan 10, 2023

cjnolet commented Jan 10, 2023

Intron7 commented Jan 10, 2023

Intron7 commented Nov 30, 2022 •

edited

Loading

cjnolet left a comment •

edited

Loading

Intron7 commented Dec 19, 2022 •

edited

Loading

Intron7 commented Jan 10, 2023 •

edited

Loading