Skip to content
This repository has been archived by the owner on Nov 4, 2024. It is now read-only.

Fused Operations #53

Merged
merged 18 commits into from
Apr 24, 2024
Merged

Fused Operations #53

merged 18 commits into from
Apr 24, 2024

Conversation

avik-pal
Copy link
Member

@avik-pal avik-pal commented Apr 18, 2024

  • Dense + Bias + Activation
    • Working Implementation
    • Add tests
    • Documentation
    • Use in upstream Dense
    • Mixed Precision Tests seem to be failing
    • Fused CUDA Kernels Will be handled in a later PR
      • CUBLASLt
      • GemmKernels -- currently a bit slow with the default configuration compared to CUBLAS with an additional kernel launch. There is some tuning on their end, so it might worth revisiting this later.
  • Conv + Bias + Activation
    • Working Implementation
    • Add tests
    • Documentation
    • Use in upstream Conv
  • Activation Function Fusion for Other Operations
    • LayerNorm
    • Generic Normalization
    • GroupNorm: Fuse the activation function into the kernel Will be handled in a later PR
    • Tests
  • Faster Maybe Inplace Activation Version
    • Initial Implementation
    • Benchmarking
  • Use FastBroadcast for faster broadcasts on CPU
  • Needs Restore some of the parallel testing #55
  • ForwardDiff over Zygote Tests Tests will be part of Native Nested AD support for Lux Models Lux.jl#598
  • Handle AMDGPU specifics
    • Activation Function Broadcasting for >= 6 dims
    • convolution not supported for Float64. Capture at fused_conv_bias_attention

Is Anything Broken?

  • Zygote.gradient over Zygote.gradient for dense layers will be broken once this is merged. But fear not we will have faster nested AD package merged soon!

@avik-pal avik-pal force-pushed the ap/fused_dense branch 12 times, most recently from 04b13d6 to 5a31de5 Compare April 18, 2024 15:15
@avik-pal avik-pal force-pushed the ap/fused_dense branch 6 times, most recently from 8271ed8 to c910e11 Compare April 19, 2024 19:43
@avik-pal avik-pal force-pushed the ap/fused_dense branch 2 times, most recently from 0a77c67 to b14870e Compare April 19, 2024 22:06
@avik-pal avik-pal force-pushed the ap/fused_dense branch 3 times, most recently from 6feb0c2 to 7d58f05 Compare April 23, 2024 03:35
@avik-pal avik-pal force-pushed the ap/fused_dense branch 4 times, most recently from 2c743ca to 8040ad1 Compare April 23, 2024 13:07
@avik-pal avik-pal force-pushed the ap/fused_dense branch 15 times, most recently from a649e37 to 921a393 Compare April 24, 2024 02:05
Copy link

codecov bot commented Apr 24, 2024

Codecov Report

Attention: Patch coverage is 92.23602% with 25 lines in your changes are missing coverage. Please review.

Project coverage is 82.81%. Comparing base (867d4b5) to head (6a08a48).

Files Patch % Lines
src/utils.jl 90.27% 7 Missing ⚠️
src/impl/fast_activation.jl 68.75% 5 Missing ⚠️
ext/LuxLibTrackerAMDGPUExt.jl 0.00% 4 Missing ⚠️
src/api/conv.jl 83.33% 4 Missing ⚠️
src/api/groupnorm.jl 71.42% 2 Missing ⚠️
src/impl/normalization.jl 94.11% 2 Missing ⚠️
ext/LuxLibTrackercuDNNExt.jl 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #53      +/-   ##
==========================================
+ Coverage   75.97%   82.81%   +6.83%     
==========================================
  Files          16       23       +7     
  Lines         437      704     +267     
==========================================
+ Hits          332      583     +251     
- Misses        105      121      +16     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@avik-pal avik-pal merged commit 3d41215 into main Apr 24, 2024
27 of 28 checks passed
@avik-pal avik-pal deleted the ap/fused_dense branch April 24, 2024 15:51
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant