Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

procstat metric not populated on FreeBSD arm64 #13933

Closed
sdalu opened this issue Sep 15, 2023 · 9 comments
Closed

procstat metric not populated on FreeBSD arm64 #13933

sdalu opened this issue Sep 15, 2023 · 9 comments
Assignees
Labels
bug unexpected problem or unintended behavior upstream bug or issues that rely on dependency fixes

Comments

@sdalu
Copy link

sdalu commented Sep 15, 2023

Relevant telegraf.conf

[[inputs.procstat]]
  exe                   = "influxd"
  tagexclude            = [ "pid_finder", "exe", "pidfile" ]

[[inputs.procstat]]
  exe                   = "telegraf"
  tagexclude            = [ "pid_finder", "exe", "pidfile" ]

Logs from Telegraf

2023-09-15T21:17:42Z I! Loading config: /usr/local/etc/telegraf.conf
2023-09-15T21:17:42Z I! Starting Telegraf unknown brought to you by InfluxData the makers of InfluxDB
2023-09-15T21:17:42Z I! Available plugins: 240 inputs, 9 aggregators, 29 processors, 24 parsers, 59 outputs, 4 secret-stores
2023-09-15T21:17:42Z I! Loaded inputs: procstat (10x)
2023-09-15T21:17:42Z I! Loaded aggregators: 
2023-09-15T21:17:42Z I! Loaded processors: converter override (4x) regex
2023-09-15T21:17:42Z I! Loaded secretstores: 
2023-09-15T21:17:42Z I! Loaded outputs: influxdb_v2
2023-09-15T21:17:42Z I! Tags enabled: host=brain.home.sdalu.com
2023-09-15T21:17:42Z I! [agent] Config: Interval:20s, Quiet:false, Hostname:"brain.home.sdalu.com", Flush Interval:30s
2023-09-15T21:17:42Z D! [agent] Initializing plugins
2023-09-15T21:17:42Z D! [agent] Connecting outputs
2023-09-15T21:17:42Z D! [agent] Attempting connection to [outputs.influxdb_v2]
2023-09-15T21:17:42Z D! [agent] Successfully connected to outputs.influxdb_v2
2023-09-15T21:17:42Z D! [agent] Starting service inputs
2023-09-15T21:18:13Z D! [outputs.influxdb_v2] Wrote batch of 10 metrics in 243.505683ms
2023-09-15T21:18:13Z D! [outputs.influxdb_v2] Buffer fullness: 0 / 10000 metrics

System info

Telegraf 1.28.0 FreeBSD 13.2 arm64

Docker

No response

Steps to reproduce

  1. run the configuration

Expected behavior

Some procstat and procstat_lookup metrics, like:

> procstat,host=rork.home.sdalu.com,org_destination=IT,process_name=telegraf,user=telegraf cpu_time_guest=0,cpu_time_guest_nice=0,cpu_time_idle=0,cpu_time_iowait=0,cpu_time_irq=0,cpu_time_nice=0,cpu_time_soft_irq=0,cpu_time_steal=0,cpu_time_system=2654.42098,cpu_time_user=816.839403,cpu_usage=0.035786768916299505,created_at=1694607213273000000i,memory_data=0i,memory_locked=0i,memory_rss=193368064i,memory_stack=0i,memory_swap=0i,memory_usage=0.37625008821487427,memory_vms=5346725888i,num_threads=32i,pid=2422i,ppid=2421i,read_bytes=0i,read_count=1115i,write_bytes=0i,write_count=0i 1694812868000000000
> procstat_lookup,host=rork.home.sdalu.com,org_destination=IT,result=success pid_count=1i,result_code=0i,running=1i 1694812868000000000

Actual behavior

Only procstat_lookup is generated, no procstat

Additional info

Don't know if behaviour is specific to arm64 or the whole arm family.
Tested on amd64, and it's working fine, so it is not specific to FreeBSD

@sdalu sdalu added the bug unexpected problem or unintended behavior label Sep 15, 2023
@powersj
Copy link
Contributor

powersj commented Sep 18, 2023

Hi,

As I mentioned in the previous issue we do not provide an arm64 build for FreeBSD, so to work on this I would need you to build and test PRs.

The procstat metric is generated in the addMetric function here.

procstat_lookup,host=rork.home.sdalu.com,org_destination=IT,result=success pid_count=1i,result_code=0i,running=1i 1694812868000000000

Is this the actual output of procstat_lookup? Before diving much further into this I want to be certain that running is actually non-zero. If it is zero, then there will be no procstat metric generated or if there were any errors updating the processes.

Finally, the data is all gathered via gopsutil's library, so I think we should try outside of telegraf as well.

If you create a directory and create two files:

main.go:

package main

import (
	"fmt"
	"os"

	"github.com/shirou/gopsutil/process"
)

func main() {
	currentPid := os.Getpid()
	myself, err := process.NewProcess(int32(currentPid))
	if err != nil {
		panic(err)
	}
	fmt.Println(myself.Name())
	fmt.Println(myself.String())
	fmt.Println(myself.NumThreads())
	fmt.Println(myself.RlimitUsage(true))
	fmt.Println(myself.Status())
}

go.mod - replace the go version with whatever you have locally:

module test-process

go 1.21

And either run this directly via go run . or build it go build . and run the test-process binary.

@powersj powersj added the waiting for response waiting for response from contributor label Sep 18, 2023
@telegraf-tiger
Copy link
Contributor

telegraf-tiger bot commented Oct 2, 2023

Hello! I am closing this issue due to inactivity. I hope you were able to resolve your problem, if not please try posting this question in our Community Slack or Community Forums or provide additional details in this issue and reqeust that it be re-opened. Thank you!

@telegraf-tiger telegraf-tiger bot closed this as completed Oct 2, 2023
@sdalu
Copy link
Author

sdalu commented Oct 2, 2023

Sorry for the late answer

procstat_lookup,host=rork.home.sdalu.com,org_destination=IT,result=success pid_count=1i,result_code=0i,running=1i 1694812868000000000

Is this the actual output of procstat_lookup? Before diving much further into this I want to be certain that running is actually non-zero. If it is zero, then there will be no procstat metric generated or if there were any errors updating the processes.

Yes that's actual output

Finally, the data is all gathered via gopsutil's library, so I think we should try outside of telegraf as well.
[...]
And either run this directly via go run . or build it go build . and run the test-process binary.

Output is:

 <nil>
{"pid":91329}
0 <nil>
[] not implemented yet
 <nil>

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Oct 2, 2023
@powersj
Copy link
Contributor

powersj commented Oct 2, 2023

Gopsutil is providing a nil name and other metrics, which means we are skipping the process. Here is the code in Telegraf, which checks for the nil name and commets that if this is nil we assume we are not getting anything else. Which based on the output, seems to also return default values or nil.

I would suggest an upstream issue as part of the gopsutil project to get this added or enabled there. You can use the example code I provided in my previous comment of a way to reproduce.

@powersj
Copy link
Contributor

powersj commented May 1, 2024

@sdalu,

I have put up #15272 which includes an update to gopsutil library. Your upstream issue appears to have been fixed back in March so it is likely that our last release already has this fix. Could you please download artifacts from that PR, which will be attached as a comment ~30mins from this message, and let me know if this resolves this issue?

Thanks!

@powersj powersj added the waiting for response waiting for response from contributor label May 1, 2024
@sdalu
Copy link
Author

sdalu commented May 2, 2024

I downloaded telegraf-1.31.0~553d972c_freebsd_armv7.tar.gz and run

./telegraf-1.31.0/usr/bin/telegraf --config /usr/local/etc/telegraf.conf --debug

Got a panic

2024-05-02T13:09:40Z E! FATAL: [inputs.procstat] panicked: runtime error: invalid memory address or nil pointer dereference, Stack:
goroutine 147 [running]:
github.com/influxdata/telegraf/agent.panicRecover(0x4d410370)
	/go/src/github.com/influxdata/telegraf/agent/agent.go:1202 +0x74
panic({0x67aa400, 0xc587b20})
	/usr/local/go/src/runtime/panic.go:770 +0xfc
github.com/shirou/gopsutil/v3/process.(*Process).createTimeWithContext(0x4d0a0368, {0x8232a44, 0xc9983c0})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process_freebsd.go:121 +0x4c
github.com/shirou/gopsutil/v3/process.(*Process).CreateTimeWithContext(0x4d0a0368, {0x8232a44, 0xc9983c0})
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:310 +0x74
github.com/shirou/gopsutil/v3/process.NewProcessWithContext({0x8232a44, 0xc9983c0}, 0x3744)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:218 +0x78
github.com/shirou/gopsutil/v3/process.NewProcess(...)
	/go/pkg/mod/github.com/shirou/gopsutil/[email protected]/process/process.go:203
github.com/influxdata/telegraf/plugins/inputs/procstat.newProc(0x3744)
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/process.go:38 +0x30
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).gatherOld(0x4ccc6e48, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:209 +0x848
github.com/influxdata/telegraf/plugins/inputs/procstat.(*Procstat).Gather(0x4ccc6e48, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/plugins/inputs/procstat/procstat.go:166 +0x38
github.com/influxdata/telegraf/models.(*RunningInput).Gather(0x4d410370, {0x824a858, 0x4d40cae0})
	/go/src/github.com/influxdata/telegraf/models/running_input.go:227 +0x2c4
github.com/influxdata/telegraf/agent.(*Agent).gatherOnce.func1()
	/go/src/github.com/influxdata/telegraf/agent/agent.go:583 +0x70
created by github.com/influxdata/telegraf/agent.(*Agent).gatherOnce in goroutine 120
	/go/src/github.com/influxdata/telegraf/agent/agent.go:581 +0xc0

goroutine 1 [semacquire]:
sync.runtime_Semacquire(0x4d310b68)
	/usr/local/go/src/runtime/sema.go:62 +0x3c
sync.(*WaitGroup).Wait(0
2024-05-02T13:09:40Z E! PLEASE REPORT THIS PANIC ON GITHUB with stack trace, configuration, and OS information: https://github.com/influxdata/telegraf/issues/new/choose

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label May 2, 2024
@powersj
Copy link
Contributor

powersj commented May 2, 2024

Well that's no good! Can you file a second upstream issue please with that stack trace. It does appear that gopsutil's createTimeWithContext function is the cause of the crash.

@DStrand1
Copy link
Member

@sdalu I'm not able to reproduce this on the latest release of telegraf, is this issue still affecting you?

@DStrand1 DStrand1 added the waiting for response waiting for response from contributor label Oct 28, 2024
@sdalu
Copy link
Author

sdalu commented Nov 5, 2024

I don't see any more blank panels in my Grafana board, it seems fixed.

@telegraf-tiger telegraf-tiger bot removed the waiting for response waiting for response from contributor label Nov 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug unexpected problem or unintended behavior upstream bug or issues that rely on dependency fixes
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants