Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MacBook Pro M1 Max doesn't give correct number of threads with --threads=auto #44067

Closed
navidcy opened this issue Feb 7, 2022 · 9 comments · Fixed by #44072
Closed

MacBook Pro M1 Max doesn't give correct number of threads with --threads=auto #44067

navidcy opened this issue Feb 7, 2022 · 9 comments · Fixed by #44072

Comments

@navidcy
Copy link
Contributor

navidcy commented Feb 7, 2022

I get different results for Threads.nthreads() on a Macbook Pro M1 Max using the latest version of julia and the tagged v1.7.1.

In particular, with latest Julia version I get 6! But when I look at System Info I see 8?

Screen Shot 2022-02-08 at 7 27 47 am

For julia#master at e0a4b77

navid:~/ $ /Users/navid/julia/julia --threads=auto
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.0-DEV.1455 (2022-02-06)
 _/ |\__'_|_|_|\__'_|  |  Commit e0a4b7727c (0 days old master)
|__/                   |

julia> versioninfo()
Julia Version 1.8.0-DEV.1455
Commit e0a4b7727c (2022-02-06 12:55 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.0 (ORCJIT, cyclone)
Environment:
  JULIA_EDITOR = vim

julia> Threads.nthreads()
6

while on Julia v1.7

navid:~/ $ /Users/navid/julia-1.7/julia --threads=auto
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.7.1 (2021-12-22)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: macOS (arm64-apple-darwin21.3.0)
  CPU: Apple M1 Max
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, cyclone)
Environment:
  JULIA_EDITOR = vim

julia> Threads.nthreads()
10

Probably the fix is related to this bit?

julia/src/sys.c

Lines 598 to 630 in 546a774

// Apple's M1 processor is a big.LITTLE style processor, with 4x "performance"
// cores, and 4x "efficiency" cores. Because Julia expects to be able to run
// things like heavy linear algebra workloads on all cores, it's best for us
// to only spawn as many threads as there are performance cores. Once macOS
// 12 is released, we'll be able to query the multiple "perf levels" of the
// cores of a CPU (see this PR [0] to pytorch/cpuinfo for an example) but
// until it's released, we will just recognize the M1 by its CPU family
// identifier, then subtract how many efficiency cores we know it has.
JL_DLLEXPORT int jl_cpu_threads(void) JL_NOTSAFEPOINT
{
#if defined(HW_AVAILCPU) && defined(HW_NCPU)
size_t len = 4;
int32_t count;
int nm[2] = {CTL_HW, HW_AVAILCPU};
sysctl(nm, 2, &count, &len, NULL, 0);
if (count < 1) {
nm[1] = HW_NCPU;
sysctl(nm, 2, &count, &len, NULL, 0);
if (count < 1) { count = 1; }
}
#if defined(__APPLE__) && defined(_CPU_AARCH64_)
// Manually subtract efficiency cores for Apple's big.LITTLE cores
int32_t family = 0;
len = 4;
sysctlbyname("hw.cpufamily", &family, &len, NULL, 0);
if (family >= 1 && count > 1) {
if (family == CPUFAMILY_ARM_FIRESTORM_ICESTORM) {
// We know the Apple M1 has 4 efficiency cores, so subtract them out.
count -= 4;
}
}

@glwagner, @sandreza

@mkitti
Copy link
Contributor

mkitti commented Feb 7, 2022

See #42099
Ping @staticfloat
image
No, Picard, there seem to be only 2.

@mkitti
Copy link
Contributor

mkitti commented Feb 7, 2022

@navidcy
Copy link
Contributor Author

navidcy commented Feb 7, 2022

@mkitti, that's relevant. But on Apple M1 Max processor one needs to subtract only 2 and not 4, as done in

julia/src/sys.c

Line 628 in 546a774

count -= 4;

Right?

@mkitti
Copy link
Contributor

mkitti commented Feb 7, 2022

Right. See this comment: #42099 (comment) .

There appears to be a way to query this information rather than hard code it.

@gbaraldi
Copy link
Member

gbaraldi commented Feb 8, 2022

@navidcy Could you kindly build the following C script and say what it prints for you.

#include <sys/sysctl.h>
#include <stdio.h>

int main()
{
    char str[7];
    size_t len = 7;
    sysctlbyname("kern.osrelease", str, &len, NULL, 0);
    printf("%s\n", str);
    if (str[0] > 1 && str[1] > 0)
    {
        len = 4;
        int ncpus;
        sysctlbyname("hw.perflevel0.physicalcpu", &ncpus, &len, NULL, 0);
        printf("%d\n", ncpus);
    }
}

@navidcy
Copy link
Contributor Author

navidcy commented Feb 8, 2022

Sure, but I'd like some help with C.

How do I build this?

@gbaraldi
Copy link
Member

gbaraldi commented Feb 8, 2022

It should be enough to put it into a file called test.c somewhere in your computer, navigate to that place with your terminal and do clang test.c -o test after that do ./test and it should print two lines.

@navidcy
Copy link
Contributor Author

navidcy commented Feb 8, 2022

21.3.0
8

@gbaraldi
Copy link
Member

gbaraldi commented Feb 8, 2022

Thanks for the help. I will open a PR that will make it set the correct number of threads.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants