Optimize MG variance calculation for dataset standardization for logistic regression #6138

lijinf2 · 2024-11-20T04:01:52Z

MG variance calculation currently involks raft SG vars API. However, the abs() step of raft SG vars API introduces errors in skewed data distribution (e.g., one GPU gets small values 1 and 2, and the other GPU gets large values 98 and 99).

The PR avoids the effect of abs() when involking SG vars for calculating MG vars. The key idea is to pass a vector of zeroes when calling SG vars.

copy-pr-bot · 2024-11-20T04:01:57Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

cpp/src/glm/qn/mg/standardization.cuh

…table

dantegd · 2024-12-05T15:21:59Z

/merge

lijinf2 requested a review from a team as a code owner November 20, 2024 04:01

lijinf2 requested review from wphicks and divyegala November 20, 2024 04:01

github-actions bot added the CUDA/C++ label Nov 20, 2024

lijinf2 added CUDA / C++ CUDA issue Cython / Python Cython or Python issue improvement Improvement / enhancement to an existing function non-breaking Non-breaking change and removed CUDA/C++ labels Nov 20, 2024

lijinf2 force-pushed the raft_stat_1118 branch from b6ecfae to 083f204 Compare November 20, 2024 04:04

github-actions bot added CUDA/C++ and removed Cython / Python Cython or Python issue labels Nov 20, 2024

lijinf2 changed the title ~~Optimize MG variance calculation for logistic regression standardization~~ Optimize MG variance calculation for dataset standardization for logistic regression Nov 20, 2024

eordentlich reviewed Nov 21, 2024

View reviewed changes

cpp/src/glm/qn/mg/standardization.cuh Outdated Show resolved Hide resolved

lijinf2 force-pushed the raft_stat_1118 branch 2 times, most recently from 33e27de to b3ee71e Compare November 26, 2024 18:30

lijinf2 added the 3 - Ready for Review Ready for review by team label Nov 26, 2024

lijinf2 added 4 commits December 3, 2024 05:23

add logging infor to print out mean vec var for debugging numeric ins…

521ec03

…table

got new var_mg working for standardization

46be2ec

clean

00acbc3

optimize per comment

8eeded4

lijinf2 force-pushed the raft_stat_1118 branch from b3ee71e to 8eeded4 Compare December 3, 2024 05:24

dantegd approved these changes Dec 5, 2024

View reviewed changes

rapids-bot bot merged commit de96f3a into rapidsai:branch-24.12 Dec 5, 2024
65 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize MG variance calculation for dataset standardization for logistic regression #6138

Optimize MG variance calculation for dataset standardization for logistic regression #6138

lijinf2 commented Nov 20, 2024 •

edited

Loading

copy-pr-bot bot commented Nov 20, 2024

dantegd commented Dec 5, 2024

Optimize MG variance calculation for dataset standardization for logistic regression #6138

Optimize MG variance calculation for dataset standardization for logistic regression #6138

Conversation

lijinf2 commented Nov 20, 2024 • edited Loading

copy-pr-bot bot commented Nov 20, 2024

dantegd commented Dec 5, 2024

lijinf2 commented Nov 20, 2024 •

edited

Loading