Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add FieldVector microbenchmarks and improve fieldvector broadcast performance #2070

Merged
merged 1 commit into from
Nov 6, 2024

Conversation

charleskawczynski
Copy link
Member

@charleskawczynski charleskawczynski commented Nov 1, 2024

Closes #2067. This turned out to be pretty easy: any fieldvector operation is embarrassingly parallel. We can leverage NonExtrudedBroadcasted, forward everything to the backing arrays, and just linearly index everywhere. This puts us in a pretty good shape for fieldvector operations, and it'll be even better at lower resolution since we parallelize across field variables.

GPU

main

N reads-writes: 5,  Float_type = Float64, Device_bandwidth_GBs=2039
┌─────────────┬───────────────────────────────────┬─────────┬─────────────┬──────────────┬────────┐
│ funcs       │ time per call                     │ bw %    │ achieved bw │ problem size │ n-reps │
├─────────────┼───────────────────────────────────┼─────────┼─────────────┼──────────────┼────────┤
│ FieldVector │ 1 millisecond, 104 microseconds   │ 54.1611104.34     │ (32745600,)  │ 4475   │
│ FieldVector │ 307 microseconds, 597 nanoseconds │ 48.6243991.45      │ (8186400,)   │ 10000  │
└─────────────┴───────────────────────────────────┴─────────┴─────────────┴──────────────┴────────┘

this PR

N reads-writes: 5,  Float_type = Float64, Device_bandwidth_GBs=2039
┌─────────────┬───────────────────────────────────┬─────────┬─────────────┬──────────────┬────────┐
│ funcs       │ time per call                     │ bw %    │ achieved bw │ problem size │ n-reps │
├─────────────┼───────────────────────────────────┼─────────┼─────────────┼──────────────┼────────┤
│ FieldVector │ 757 microseconds, 513 nanoseconds │ 78.97791610.36     │ (32745600,)  │ 6491   │
│ FieldVector │ 202 microseconds, 849 nanoseconds │ 73.73351503.43     │ (8186400,)   │ 10000  │
└─────────────┴───────────────────────────────────┴─────────┴─────────────┴──────────────┴────────┘

CPU

main

N reads-writes: 5,  Float_type = Float64,
┌─────────────┬──────────────────────────────────┬─────────────┬──────────────┬────────┐
│ funcs       │ time per call                    │ achieved bw │ problem size │ n-reps │
├─────────────┼──────────────────────────────────┼─────────────┼──────────────┼────────┤
│ FieldVector │ 32 milliseconds, 45 microseconds │ 38.0674     │ (32745600,)  │ 33     │
│ FieldVector │ 7 milliseconds, 792 microseconds │ 39.1372     │ (8186400,)   │ 529    │
└─────────────┴──────────────────────────────────┴─────────────┴──────────────┴────────┘

This PR

N reads-writes: 5,  Float_type = Float64,
┌─────────────┬───────────────────────────────────┬─────────────┬──────────────┬────────┐
│ funcs       │ time per call                     │ achieved bw │ problem size │ n-reps │
├─────────────┼───────────────────────────────────┼─────────────┼──────────────┼────────┤
│ FieldVector │ 23 milliseconds, 218 microseconds │ 52.5398     │ (32745600,)  │ 43     │
│ FieldVector │ 5 milliseconds, 752 microseconds  │ 53.0143     │ (8186400,)   │ 828    │
└─────────────┴───────────────────────────────────┴─────────────┴──────────────┴────────┘

@charleskawczynski charleskawczynski force-pushed the ck/fieldvector_benchmarks branch from 7a66cf4 to f1e9282 Compare November 2, 2024 18:57
@charleskawczynski charleskawczynski requested review from dennisYatunin and Sbozzolo and removed request for dennisYatunin November 2, 2024 19:01
@charleskawczynski charleskawczynski changed the title Add FieldVector microbenchmarks Add FieldVector microbenchmarks and improve fieldvector broadcast performance Nov 2, 2024
@charleskawczynski charleskawczynski force-pushed the ck/fieldvector_benchmarks branch from 9087954 to 9e49241 Compare November 2, 2024 19:25
src/Fields/fieldvector.jl Outdated Show resolved Hide resolved
@charleskawczynski
Copy link
Member Author

Interesting, this shows that ClimaLand is using fieldvectors in some untested edge case. I'm going to find out the difference and add some unit tests that exercise this.

@charleskawczynski charleskawczynski force-pushed the ck/fieldvector_benchmarks branch 3 times, most recently from da45803 to e0dec11 Compare November 5, 2024 19:50
@charleskawczynski
Copy link
Member Author

This looks done! 🎉

@charleskawczynski
Copy link
Member Author

Just to comment on why this took a bit longer to iron out: the land model seems to rely on recursively defined fieldvectors, which we support copyto! for. Doing this correctly is a bit tricky because we need to dispatch to a custom (and device-specific) method to better optimize. All that said, I'm happy with the result.

@charleskawczynski charleskawczynski force-pushed the ck/fieldvector_benchmarks branch from e0dec11 to 1c512be Compare November 5, 2024 21:13
@charleskawczynski charleskawczynski merged commit a093d4a into main Nov 6, 2024
32 of 33 checks passed
@charleskawczynski charleskawczynski deleted the ck/fieldvector_benchmarks branch November 6, 2024 19:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add FieldVector microbenchmarks and improve performance
2 participants