Fix num cpu #1518

jdamato-fsly · 2019-10-14T21:54:19Z

runtime.NumCPU() returns the number of CPUs that the process can run
on. This number does not necessarily correlate to CPU ids if the
affinity mask of the process is set.

This change maintains the current behavior as default, but also allows
the user to specify a range of CPUids to use instead.

The CPU id is stored as the value of a map keyed on the profiler
object's address.

SuperQ

Interesting, thanks, it seems mostly reasonable.

Please sign off your commits with git commit -s --amend. Also, it would be good to add an [ENHANCEMENT] entry to the changelog.

collector/perf_linux.go

`runtime.NumCPU()` returns the number of CPUs that the process can run on. This number does not necessarily correlate to CPU ids if the affinity mask of the process is set. This change maintains the current behavior as default, but also allows the user to specify a range of CPUids to use instead. The CPU id is stored as the value of a map keyed on the profiler object's address. Signed-off-by: Joe Damato <[email protected]>

jdamato-fsly · 2019-10-15T23:51:53Z

Thanks for the review @SuperQ. Made some changes as you've suggested. Let me know how this looks.

hodgesds · 2019-10-16T02:37:41Z

This is a nice config option!

collector/perf_linux.go

Signed-off-by: Joe Damato <[email protected]>

jdamato-fsly · 2019-10-16T18:43:15Z

I think this addresses your feedback @discordianfish and @SuperQ.

Was wondering if there might be anyone who would be interested in testing this just to double check that this is working for them? cc @hodgesds

Signed-off-by: Joe Damato <[email protected]>

jdamato-fsly · 2019-10-16T20:12:48Z

collector/perf_linux.go

-	for cpu, profiler := range c.perfSwProfilers {
-		cpuStr := fmt.Sprintf("%d", cpu)
+	for _, profiler := range c.perfSwProfilers {
+		cpuid := c.swProfilerCpuMap[&profiler]


So, I'm new to go and this appears to be a bug.

&profiler here refers to the address of the profiler object itself and not the address of the perf.SoftwareProfiler that was inserted into c.swProfilerCpuMap.

This leads to the incorrect CPU id being retrieved. I'm not really sure how to deal with this as I know basically 0 golang (sorry). Would love any suggestions you folks have on the best way to fix this.

Signed-off-by: Joe Damato <[email protected]>

jdamato-fsly · 2019-10-16T20:36:49Z

I don't really know anything about golang but this commit I just pushed: 573cf02 I think fixes the address of the object issue I mentioned here: #1518 (comment)

hodgesds · 2019-10-16T22:19:36Z

collector/perf_linux.go

-	perfSwProfilers    map[int]perf.SoftwareProfiler
-	perfCacheProfilers map[int]perf.CacheProfiler
-	desc               map[string]*prometheus.Desc
+	hwProfilerCpuMap    map[*perf.HardwareProfiler]int


I wouldn't expect to keep a map of pointers to interfaces. For example NewSoftwareProfiler returns an interface that can be used directly.

yea but I think you need them otherwise you can't generate a map of pointers to CPU ids.

confirmed that this change (pointers to interfaces) works correctly, as the address pointed to can be used as a key for the CPU ID map. the code without this change does not work as mentioned here: #1518 (comment)

I think keeping this is correct -- I am using this in a lab setting and am getting the expected results now.

README.md

hodgesds · 2019-10-16T22:45:05Z

It might be nice to make the flag match the -c option of taskset.

jdamato-fsly · 2019-10-16T23:02:17Z

It might be nice to make the flag match the -c option of taskset.

That sounds like a lot of work. Happy to turn that over to anyone else who'd like to jump in on this, though.

Signed-off-by: Joe Damato <[email protected]>

hodgesds · 2019-10-17T00:57:01Z

It might be nice to make the flag match the -c option of taskset.

That sounds like a lot of work. Happy to turn that over to anyone else who'd like to jump in on this, though.

I think this roughly does what you want, let me know your thoughts.

jdamato-fsly · 2019-10-17T17:58:38Z

I think this roughly does what you want, let me know your thoughts.

I left a few comments on the code over there. As mentioned above, I'm not a golang programmer so I'm probably not a good person to ask for a code review 😅

FWIW I am using the code I wrote in this branch in a lab setting and it is working as expected (as mentioned here: #1518 (comment)).

In my use case: I run node_exporter on a small set of CPUs and want to collect perf stats from all CPUs; so having more advanced CPU set options isn't something that would apply directly to my use case.

It's worth noting that any use of runtime.numCPU where the affinity list is not set to all CPUs will be incorrect -- the stats returned will be mislabeled with the wrong CPU ID. It may be worth noting that in the docs.

hodgesds · 2019-10-18T00:07:00Z

I see the bug now, here's a small example that shows it's because a interface isn't assignable, the index expression isn't valid by taking the address of the interface it works. I think this is fine then, would you mind porting over the flag parsing code so that strides work properly as well?

jdamato-fsly · 2019-10-18T18:05:48Z

I see the bug now, here's a small example that shows it's because a interface isn't assignable, the index expression isn't valid by taking the address of the interface it works. I think this is fine then, would you mind porting over the flag parsing code so that strides work properly as well?

Thanks for the detailed explanation. Sure, I can copy over your flag parsing code shortly.

jdamato-fsly · 2019-10-18T20:38:08Z

FWIW, worth noting that using the code I wrote in this branch the HardwareProfiler which gathers unix.PERF_COUNT_HW_CACHE_REFERENCES and unix.PERF_COUNT_HW_CACHE_MISSES appears to be returning incorrect results. I am not sure if this is a bug in node_exporter or perf-utils, though.

hodgesds · 2019-10-18T21:12:51Z

FWIW, worth noting that using the code I wrote in this branch the HardwareProfiler which gathers unix.PERF_COUNT_HW_CACHE_REFERENCES and unix.PERF_COUNT_HW_CACHE_MISSES appears to be returning incorrect results. I am not sure if this is a bug in node_exporter or perf-utils, though.

What are you observing? PERF_COUNT_HW_CACHE_REFERENCES is very hardware dependent. From the perf_event_open man page:

PERF_COUNT_HW_CACHE_REFERENCES
Cache accesses. Usually this indicates Last Level
Cache accesses but this may vary depending on your
CPU. This may include prefetches and coherency
messages; again this depends on the design of your
CPU.

Note that configuration for those events also includes some defaults that are pretty opinionated. For more configuration and moved much of that functionality into perf_exporter.

jdamato-fsly · 2019-10-18T21:30:50Z

What are you observing? PERF_COUNT_HW_CACHE_REFERENCES is very hardware dependent. From the perf_event_open man page:

On kernel 4.19 with a sandybridge CPU issuing curl against node_exporter's HTTP server shows that some set of CPUs cache refs values never change (which is impossible on a system with this much load on all CPUs) -- HOWEVER -- when I use perf stat to obtain the same metric via the command line, I can clearly see the stat values changing on each run of perf stat.

For example, on my system node_exporter returns values which are unchanged for CPU 23 on every single curl request against node_exporter's HTTP server that I issued over a period of several minutes.

However, running perf stat -e cache-references,cache-misses -C 23 even for very sort periods of time shows that both values change very quickly (as expected).

This leads me to believe that either node_exporter or perf-utils are returning stale values for some currently unknown reason for certain CPUs, as the command line perf stat -e cache-references,cache-misses -C $CPU returns accurate values.

I'm not sure if this bug is related to my code or something else.

EDIT : I should note that some CPUs seem to have correct values, but others have values which never seem to change from node_exporter.

hodgesds · 2019-10-18T22:44:43Z

I've tested your branch on two kernels (4.14.78 and 5.2.5) on a E3-1505M v5 and Ryzen 2600 processors and both seem to be working as expected. Do you have access to any other architectures for testing?

jdamato-fsly · 2019-10-18T22:45:59Z

It appears to work on kernel 4.9 on this machine, but not on kernel 4.19. Could be some weird kernel regression. Any chance you could test on 4.19?

hodgesds · 2019-10-18T23:13:27Z

I run custom kernels for everything so my config will likely be different. Can you post the output of:

cat /boot/config-$(uname -r) | grep -i perf
# or 
zcat /proc/config.gz | grep -i perf

jdamato-fsly · 2019-10-19T00:20:24Z

cat /boot/config-$(uname -r) | grep -i perf
CONFIG_CGROUP_PERF=y
CONFIG_CC_OPTIMIZE_FOR_PERFORMANCE=y
CONFIG_HAVE_PERF_EVENTS=y
# Kernel Performance Events And Counters
CONFIG_PERF_EVENTS=y
# CONFIG_DEBUG_PERF_USE_VMALLOC is not set
# Performance monitoring
CONFIG_PERF_EVENTS_INTEL_UNCORE=m
CONFIG_PERF_EVENTS_INTEL_RAPL=m
CONFIG_PERF_EVENTS_INTEL_CSTATE=m
# CONFIG_PERF_EVENTS_AMD_POWER is not set
CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE=y
CONFIG_CPU_FREQ_GOV_PERFORMANCE=y
CONFIG_PCIEASPM_PERFORMANCE=y
CONFIG_HAVE_PERF_EVENTS_NMI=y
CONFIG_HAVE_HARDLOCKUP_DETECTOR_PERF=y
CONFIG_HAVE_PERF_REGS=y
CONFIG_HAVE_PERF_USER_STACK_DUMP=y
# Performance monitor support
CONFIG_RCU_PERF_TEST=m

hodgesds · 2019-10-22T00:37:27Z

I've tested this on a few machines with a similar config and it seems to be working as expected. I think this is good to go, however you may want to do some more digging (strace the perf_event_open calls to make sure they look sane).

pgier · 2019-10-23T19:07:26Z

collector/perf_linux.go

+	kingpin "gopkg.in/alecthomas/kingpin.v2"
+	"runtime"
+	"strconv"
+	"strings"


Normally we group all the built-in imports at the top, can you re-format these using goimports?

go get golang.org/x/tools/cmd/goimports goimports -w ./collector/perf_linux.go

discordianfish · 2019-11-20T11:46:57Z

collector/perf_linux.go

+
+		ncpus, err = strconv.Atoi(cpuRange[1])
+		if err != nil {
+			ncpus = runtime.NumCPU() - 1


I don't think we should fallback to the default when a user provided a non-integer. This should fail instead.

discordianfish · 2019-11-20T11:47:45Z

collector/perf_linux.go

+		cacheProfilerCpuMap: map[*perf.CacheProfiler]int{},
+	}
+
+	start := 0


Use a var block here:

var ( start = 0 ncpus = 0 .. )

discordianfish · 2019-11-20T11:54:30Z

collector/perf_linux.go

 		// Use -1 to profile all processes on the CPU, see:
 		// man perf_event_open
-		collector.perfHwProfilers[i] = perf.NewHardwareProfiler(-1, i)
-		if err := collector.perfHwProfilers[i].Start(); err != nil {
+		p := perf.NewHardwareProfiler(-1, i)


Let's rename the p's here:

ph = hardware profiler

ps = software profiler

pc = cache profiler

discordianfish · 2019-11-20T11:55:13Z

collector/perf_linux.go

 			return collector, err
+		} else {


No need for the else here, if err != nil it returns anyway.

(same below)

SuperQ · 2020-02-18T13:18:08Z

Should we close this in favor of #1561?

SuperQ · 2020-02-20T12:54:14Z

Superseded by #1561.

SuperQ reviewed Oct 14, 2019

View reviewed changes

collector/perf_linux.go Outdated Show resolved Hide resolved

jdamato-fsly force-pushed the fix_num_cpu branch from 50e67a7 to c6a436b Compare October 15, 2019 23:28

jdamato-fsly force-pushed the fix_num_cpu branch from c6a436b to b17a872 Compare October 15, 2019 23:47

SuperQ reviewed Oct 16, 2019

View reviewed changes

collector/perf_linux.go Outdated Show resolved Hide resolved

discordianfish reviewed Oct 16, 2019

View reviewed changes

collector/perf_linux.go Outdated Show resolved Hide resolved

discordianfish reviewed Oct 16, 2019

View reviewed changes

collector/perf_linux.go Outdated Show resolved Hide resolved

Fix scoping on CPU id maps and style cleanup

27381c4

Signed-off-by: Joe Damato <[email protected]>

jdamato-fsly force-pushed the fix_num_cpu branch from f0bb354 to 27381c4 Compare October 16, 2019 18:32

Initialize CPU maps

341322e

Signed-off-by: Joe Damato <[email protected]>

jdamato-fsly force-pushed the fix_num_cpu branch from c772e25 to 341322e Compare October 16, 2019 20:09

jdamato-fsly commented Oct 16, 2019

View reviewed changes

Use perf Profiler pointers instead

573cf02

Signed-off-by: Joe Damato <[email protected]>

jdamato-fsly force-pushed the fix_num_cpu branch from 57bc58c to 573cf02 Compare October 16, 2019 20:35

hodgesds reviewed Oct 16, 2019

View reviewed changes

README.md Outdated Show resolved Hide resolved

Update docs to include 0-indexed CPU ids

b813bf1

Signed-off-by: Joe Damato <[email protected]>

jdamato-fsly force-pushed the fix_num_cpu branch from cf393a4 to b813bf1 Compare October 16, 2019 23:04

pgier reviewed Oct 23, 2019

View reviewed changes

discordianfish reviewed Nov 20, 2019

View reviewed changes

hodgesds mentioned this pull request Nov 30, 2019

Fix num cpu #1561

Merged

SuperQ closed this Feb 20, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix num cpu #1518

Fix num cpu #1518

jdamato-fsly commented Oct 14, 2019

SuperQ left a comment

jdamato-fsly commented Oct 15, 2019

hodgesds commented Oct 16, 2019

jdamato-fsly commented Oct 16, 2019

jdamato-fsly Oct 16, 2019

jdamato-fsly commented Oct 16, 2019

hodgesds Oct 16, 2019

jdamato-fsly Oct 16, 2019

jdamato-fsly Oct 17, 2019

hodgesds commented Oct 16, 2019

jdamato-fsly commented Oct 16, 2019

hodgesds commented Oct 17, 2019 •

edited

Loading

jdamato-fsly commented Oct 17, 2019

hodgesds commented Oct 18, 2019

jdamato-fsly commented Oct 18, 2019

jdamato-fsly commented Oct 18, 2019

hodgesds commented Oct 18, 2019

jdamato-fsly commented Oct 18, 2019 •

edited

Loading

hodgesds commented Oct 18, 2019

jdamato-fsly commented Oct 18, 2019

hodgesds commented Oct 18, 2019

jdamato-fsly commented Oct 19, 2019

hodgesds commented Oct 22, 2019

pgier Oct 23, 2019

discordianfish Nov 20, 2019

discordianfish Nov 20, 2019

discordianfish Nov 20, 2019

discordianfish Nov 20, 2019

discordianfish Nov 20, 2019

SuperQ commented Feb 18, 2020

SuperQ commented Feb 20, 2020

Fix num cpu #1518

Fix num cpu #1518

Conversation

jdamato-fsly commented Oct 14, 2019

SuperQ left a comment

Choose a reason for hiding this comment

jdamato-fsly commented Oct 15, 2019

hodgesds commented Oct 16, 2019

jdamato-fsly commented Oct 16, 2019

Choose a reason for hiding this comment

jdamato-fsly commented Oct 16, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hodgesds commented Oct 16, 2019

jdamato-fsly commented Oct 16, 2019

hodgesds commented Oct 17, 2019 • edited Loading

jdamato-fsly commented Oct 17, 2019

hodgesds commented Oct 18, 2019

jdamato-fsly commented Oct 18, 2019

jdamato-fsly commented Oct 18, 2019

hodgesds commented Oct 18, 2019

jdamato-fsly commented Oct 18, 2019 • edited Loading

hodgesds commented Oct 18, 2019

jdamato-fsly commented Oct 18, 2019

hodgesds commented Oct 18, 2019

jdamato-fsly commented Oct 19, 2019

hodgesds commented Oct 22, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

SuperQ commented Feb 18, 2020

SuperQ commented Feb 20, 2020

hodgesds commented Oct 17, 2019 •

edited

Loading

jdamato-fsly commented Oct 18, 2019 •

edited

Loading