Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

skip heartbeat thread from CPU profile #201

Merged
merged 1 commit into from
Jan 3, 2025

Conversation

d-netto
Copy link
Member

@d-netto d-netto commented Jan 2, 2025

PR Description

Intentionally skips the heartbeat thread from the CPU profiler, so that we don't include a large number of useless samples from semaphore_wait_trap.

X-ref: https://relationalai.atlassian.net/browse/RAI-31811.

Checklist

Requirements for merging:

@d-netto d-netto requested a review from kpamnany January 2, 2025 11:06
@github-actions github-actions bot added port-to-v1.10 This change should apply to Julia v1.10 builds port-to-master This change should apply to all future Julia builds labels Jan 2, 2025
@d-netto
Copy link
Member Author

d-netto commented Jan 2, 2025

Reminder for self: test on Linux before merging.

@d-netto
Copy link
Member Author

d-netto commented Jan 2, 2025

Ran using Profile, PProf; @profile peakflops(); pprof(webport=11111).

No semaphore_wait_trap on MacOS:

Screenshot 2025-01-02 at 08 26 20

Profile collected on Linux also shows no abnormalities:

Screenshot 2025-01-02 at 08 35 34

Copy link

@vustef vustef left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not an expert, but excited about this.

Copy link
Member

@NHDaly NHDaly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So simple and clear! 👏 👏

Thanks for the screenshots! It's great to see this resolved. 😎
Can you also share a screenshot of running with more threads?

@d-netto
Copy link
Member Author

d-netto commented Jan 2, 2025

Can you also share a screenshot of running with more threads?

Ran using Profile, PProf; @profile peakflops(); pprof(webport=11111). Got this with 8 threads on my M2:

Screenshot 2025-01-02 at 14 36 13

Seems expected given most threads will be blocked on a condition variable with no tasks to run.

@NHDaly
Copy link
Member

NHDaly commented Jan 2, 2025

Perfect. Thanks! Ship it

@vustef
Copy link

vustef commented Jan 2, 2025

Can you also share a screenshot of running with more threads?

Ran using Profile, PProf; @profile peakflops(); pprof(webport=11111). Got this with 8 threads on my M2:

Screenshot 2025-01-02 at 14 36 13 Seems expected given most threads will be blocked on a condition variable with no tasks to run.

What if all threads are busy?

@d-netto
Copy link
Member Author

d-netto commented Jan 3, 2025

What if all threads are busy?

Still a large number of samples on cvwait, though the fraction seems to depend on how many GC threads are used.

Workload

using Profile
using PProf
using Base.Threads
    

function main()
    nt = Threads.nthreads()
    Base.@sync begin
        for i in 1:nt
            Threads.@spawn begin
                peakflops()
            end
        end
    end
end

@profile main()

8 compute threads, 1 GC thread.

Screenshot 2025-01-03 at 09 19 48

8 compute threads, 8 GC threads

Screenshot 2025-01-03 at 09 20 32

@d-netto d-netto merged commit e36910c into v1.10.2+RAI Jan 3, 2025
4 checks passed
@d-netto d-netto deleted the dcn-no-semaphore_wait_trap branch January 3, 2025 12:29
@vustef
Copy link

vustef commented Jan 4, 2025

What if all threads are busy?

Still a large number of samples on cvwait, though the fraction seems to depend on how many GC threads are used.

Workload

using Profile
using PProf
using Base.Threads
    

function main()
    nt = Threads.nthreads()
    Base.@sync begin
        for i in 1:nt
            Threads.@spawn begin
                peakflops()
            end
        end
    end
end

@profile main()

8 compute threads, 1 GC thread.

Screenshot 2025-01-03 at 09 19 48 ## 8 compute threads, 8 GC threads Screenshot 2025-01-03 at 09 20 32

Thanks! Do you know why is the fraction not proportionate to the num default threads vs GC thread ratio? There are 8x more default threads, so I'd expect non-useful profile to be only 1/9th of the total.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
port-to-master This change should apply to all future Julia builds port-to-v1.10 This change should apply to Julia v1.10 builds
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants