Prepopulating a Process struct with a set of fields #286

idank · 2016-12-12T20:12:02Z

When calling a function on a Process instance, it will go and get the most fresh information. If the pieces of information are all in the same file, that file will be read on each call. This is really wasteful if you know what you need upfront.

It would be nice if a Process could be created such that a set of fields are cached on initialization. Obviously if we're getting fields A, B and C from file F, we should only read F once.

It would be a lot easier to pull this off if Process was an interface rather than a struct.

shirou · 2016-12-16T15:59:31Z

Sorry for late response.

You are right. current approach is wasteful if user want to read information from same file.
However, I think it is difficult to change this behavior from these reasons.

We can not expect what information actually needed. If there are only one access, cache is wasteful.
memory, cpu etc. changes very quick. If cache is introduced, the cache invalidation mechanism is also required. And invalidate time might be quite short time.

Do you have any idea?

idank · 2016-12-16T16:27:02Z

Hi, I can think of several ways.

Add a new NewProcessWithFields(pid, fields) function. The user can then pass in fields and then the Process struct will prepopulate those fields. It should be smart enough to ask for the information once per set of related fields. For example, process_linux.go:fillFromStat is capable of returning 5 fields. Getting those 5 fields in the new scheme will only go to disk once.

Regarding invalidation, I think it's fine if we let the user control it. Alternatively, we can say that this way of creating a Process makes it immutable and it will not refresh data after construction.

I'd like gopsutil to have reasonable performance when getting information for a set of pids & predefined fields. If you think of how it works now, take darwin for example, for N pids we're making N calls to ps just to get a single field. If I have M fields, that's N*M calls to ps, which is very slow. If we're calling an API to get a piece of information (whether it is a shell command like ps, or a library call), we should get all the information we can so the cost of getting those M fields is as low as possible.

shirou · 2016-12-20T05:31:37Z

How about add CacheEnable(expire time.Duration) to Process ? If it set, methods returns from cache until expire time comes. It requires fillFromStat or other function changes but no API change.
I use the word Cache to explicitly it is a cache. So user understand the invalidation problem might be occurred. (intrducte CacheInvalidate() function if needed)

Cache can be used for other platform. So ps problem you told can be solved, I think.

What do you think?

idank · 2016-12-20T09:15:49Z

Sounds like a good start. Can we also support the zero time.Duration so a user must call CacheInvalidate to get new values? This should avoid the need to check the current time on every call to get the value of something.

shirou · 2016-12-21T05:42:19Z

Passing zero time.Duration means no expire or no cache? At first, I thought to zero means disable cache, but infinity expires are better.

So, I am thinking about introduce new two fields.

type Process struct {
	cacheExpireTime *time.Duration // default is nil

	lastIOCachedTime time.Time
	cachedIOFields string
	lastStatusCachedTime time.Time
	cachedStatusFields string
	...
}

Invoke Process.CacheEnable(expires time.Duration) to cache the value. It set cacheExpireTime
- It means if cache is not enable, cacheExpireTime is nil and easy to check.
If passing time.Duration(0), no expires. no updates will happened.
If disable cache, just set cacheExpireTime to nil via CacheDisable

How do you think?

(edited: add individual cached time and value into to the Process)

idank · 2016-12-21T09:15:40Z

It might be prettier to use negative duration as "cache disabled".

So:

negative duration: no caching (default)
zero duration: cache forever until CacheDisable or CacheInvalidate is called
positive duration: checks time.Now on every call to a field

idank · 2017-01-02T21:44:48Z

Have you had a chance to make progress with this? If not I might take a crack at it this weekend.

shirou · 2017-01-03T00:59:22Z

I'm sorry for late response. I agree your "negative duration" idea. I will very appreciate if you make a PR.

neoliv · 2017-01-24T13:55:36Z

Hi there and thx for all the work on this badly needed library. Indeed, if you want to use this lib to get information about a lot of processes and/or fields the performance is quite relevant.
Concerning cache. Why not let the user set a "sample stamp" and any cache using a stamp not equal to the current stamp is then invalid.
That way there is no need to invoke Now() for every call to the lib.
The user loop could look like:

for {
  stamp = stamp +1
  gopsutil.setCurrentStamp(stamp)
  for pid in mypids{
    get cpu percent
    get rss
    get swap
    get whatever
  }
  sleep(10s)
}

That way the lib getters (public or internal) only have to compare their cache stamp with the current stamp and invalidate their cache accordingly. No time.Now(), no complex cache invalidation scheme. When you call a getter you fill all the fields you come upon (during the various /proc/pid/* or ps) while looking for the required values. Next calls to other getters will find those values cached until the stamp changes.
Hope I'm making sense.
Cheers.

PierreF · 2020-06-12T12:03:00Z

I'm also interested in this feature. As I do something like:

for _, proc := process.Processes() {
  proc.Status()  // This call fillFromStatusWithContext
  proc.NumThreads() // This call fillFromStatusWithContext
}

We means fillFromStatusWithContext is called twice and do twice the work :(

FYI, Python's psutil use a context which enable caching until context is exited: https://psutil.readthedocs.io/en/latest/#psutil.Process.oneshot

I think it a simple solution which avoid any issue with cache invalidation and it solve my use case (and author's use-case if I understand well).

Biggest issue, go don't have such kind of context :)

I'm willing to propose a PR to solve my use-case, be it either:

A cache based on time as discussed in previous comment
A clone of OneShot from Python's psutil. My idea would be a function OneShot() which enable unlimited "caching" & and ClearOneShot() (suggestion for better name are welcome) to clear that cache and be back to default behavior
The SetCurrentStamp solution (but I personally prefer the two other).

@shirou which solution do you want me to implement ?

Lomanic · 2020-06-12T16:55:19Z

I prefer the solution in #286 (comment), with immutable information, to keep the API simple. It could be a drop-in replacement for the majority of use-cases.

Something like

type ProcessWithFields struct { // better name to be suggested? We could reuse the Process type and do some check to know if it's a snapshot process or not
	Pid            int32 `json:"pid"`
	// [...] implem details
}

type Field uint

const (
    Name Field = iota + 1
    Exe
   // [...]
)
func NewProcessWithFields(pid int32, fields ...Field) (*Process, error) {
    // retrieve data cleverly
}
func ProcessesWithFields(fields ...Field) ([]*Process, error) {
    // retrieve data cleverly for all processes
}

// same functions with WithContext

func (p *ProcessWithFields) Name() (string, error) {
    return p.name nil
}

// same for the other fields

Keep in mind this is a huge task and refactoring (to limit code duplication between current implementation and the "WithFields" one), as process data is not obtained the same way for all the OSes, the implementation and the "quick paths" will be platform-dependent.

I'll be glad to assist you in all this.

PierreF · 2020-06-12T17:21:51Z

Ok seems indeed a good API.

I'll start working on this way.

shirou mentioned this issue Aug 7, 2017

Reading stats for all running processes is inefficient #413

Open

Lomanic added os:linux package:process labels Dec 20, 2017

Lomanic mentioned this issue Apr 28, 2018

add snapshot feature for collecting process info at a point in time #517

Closed

Lomanic mentioned this issue Jan 16, 2020

Combine Process .Ppid(), .Name(), NumThreads() for Windows #818

Open

Lomanic mentioned this issue Mar 3, 2020

Processes() is extremely CPU intensive #842

Open

1 task

PierreF mentioned this issue Jun 15, 2020

Add NewProcessWithFields and ProcessesWithFields #890

Open

Lomanic mentioned this issue Aug 12, 2020

func Processes() ([]*Process, error) can pass parameters #918

Closed

Lomanic added the performance label Oct 23, 2022

Lomanic mentioned this issue Sep 10, 2024

Optimization Request: Reducing the Number of windows.OpenProcess Calls When Gathering Name, CreateTime, and PPID #1608

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prepopulating a Process struct with a set of fields #286

Prepopulating a Process struct with a set of fields #286

idank commented Dec 12, 2016

shirou commented Dec 16, 2016

idank commented Dec 16, 2016

shirou commented Dec 20, 2016

idank commented Dec 20, 2016

shirou commented Dec 21, 2016 •

edited

Loading

idank commented Dec 21, 2016 •

edited

Loading

idank commented Jan 2, 2017

shirou commented Jan 3, 2017

neoliv commented Jan 24, 2017 •

edited

Loading

PierreF commented Jun 12, 2020

Lomanic commented Jun 12, 2020 •

edited

Loading

PierreF commented Jun 12, 2020

Prepopulating a Process struct with a set of fields #286

Prepopulating a Process struct with a set of fields #286

Comments

idank commented Dec 12, 2016

shirou commented Dec 16, 2016

idank commented Dec 16, 2016

shirou commented Dec 20, 2016

idank commented Dec 20, 2016

shirou commented Dec 21, 2016 • edited Loading

idank commented Dec 21, 2016 • edited Loading

idank commented Jan 2, 2017

shirou commented Jan 3, 2017

neoliv commented Jan 24, 2017 • edited Loading

PierreF commented Jun 12, 2020

Lomanic commented Jun 12, 2020 • edited Loading

PierreF commented Jun 12, 2020

shirou commented Dec 21, 2016 •

edited

Loading

idank commented Dec 21, 2016 •

edited

Loading

neoliv commented Jan 24, 2017 •

edited

Loading

Lomanic commented Jun 12, 2020 •

edited

Loading