Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepopulating a Process struct with a set of fields #286

Open
idank opened this issue Dec 12, 2016 · 12 comments
Open

Prepopulating a Process struct with a set of fields #286

idank opened this issue Dec 12, 2016 · 12 comments

Comments

@idank
Copy link

idank commented Dec 12, 2016

When calling a function on a Process instance, it will go and get the most fresh information. If the pieces of information are all in the same file, that file will be read on each call. This is really wasteful if you know what you need upfront.

It would be nice if a Process could be created such that a set of fields are cached on initialization. Obviously if we're getting fields A, B and C from file F, we should only read F once.

It would be a lot easier to pull this off if Process was an interface rather than a struct.

@shirou
Copy link
Owner

shirou commented Dec 16, 2016

Sorry for late response.

You are right. current approach is wasteful if user want to read information from same file.
However, I think it is difficult to change this behavior from these reasons.

  1. We can not expect what information actually needed. If there are only one access, cache is wasteful.
  2. memory, cpu etc. changes very quick. If cache is introduced, the cache invalidation mechanism is also required. And invalidate time might be quite short time.

Do you have any idea?

@idank
Copy link
Author

idank commented Dec 16, 2016

Hi, I can think of several ways.

Add a new NewProcessWithFields(pid, fields) function. The user can then pass in fields and then the Process struct will prepopulate those fields. It should be smart enough to ask for the information once per set of related fields. For example, process_linux.go:fillFromStat is capable of returning 5 fields. Getting those 5 fields in the new scheme will only go to disk once.

Regarding invalidation, I think it's fine if we let the user control it. Alternatively, we can say that this way of creating a Process makes it immutable and it will not refresh data after construction.

I'd like gopsutil to have reasonable performance when getting information for a set of pids & predefined fields. If you think of how it works now, take darwin for example, for N pids we're making N calls to ps just to get a single field. If I have M fields, that's N*M calls to ps, which is very slow. If we're calling an API to get a piece of information (whether it is a shell command like ps, or a library call), we should get all the information we can so the cost of getting those M fields is as low as possible.

@shirou
Copy link
Owner

shirou commented Dec 20, 2016

How about add CacheEnable(expire time.Duration) to Process ? If it set, methods returns from cache until expire time comes. It requires fillFromStat or other function changes but no API change.
I use the word Cache to explicitly it is a cache. So user understand the invalidation problem might be occurred. (intrducte CacheInvalidate() function if needed)

Cache can be used for other platform. So ps problem you told can be solved, I think.

What do you think?

@idank
Copy link
Author

idank commented Dec 20, 2016

Sounds like a good start. Can we also support the zero time.Duration so a user must call CacheInvalidate to get new values? This should avoid the need to check the current time on every call to get the value of something.

@shirou
Copy link
Owner

shirou commented Dec 21, 2016

Passing zero time.Duration means no expire or no cache? At first, I thought to zero means disable cache, but infinity expires are better.

So, I am thinking about introduce new two fields.

type Process struct {
	cacheExpireTime *time.Duration // default is nil

	lastIOCachedTime time.Time
	cachedIOFields string
	lastStatusCachedTime time.Time
	cachedStatusFields string
	...
}
  • Invoke Process.CacheEnable(expires time.Duration) to cache the value. It set cacheExpireTime
    • It means if cache is not enable, cacheExpireTime is nil and easy to check.
  • If passing time.Duration(0), no expires. no updates will happened.
  • If disable cache, just set cacheExpireTime to nil via CacheDisable

How do you think?

(edited: add individual cached time and value into to the Process)

@idank
Copy link
Author

idank commented Dec 21, 2016

It might be prettier to use negative duration as "cache disabled".

So:

  • negative duration: no caching (default)
  • zero duration: cache forever until CacheDisable or CacheInvalidate is called
  • positive duration: checks time.Now on every call to a field

@idank
Copy link
Author

idank commented Jan 2, 2017

Have you had a chance to make progress with this? If not I might take a crack at it this weekend.

@shirou
Copy link
Owner

shirou commented Jan 3, 2017

I'm sorry for late response. I agree your "negative duration" idea. I will very appreciate if you make a PR.

@neoliv
Copy link

neoliv commented Jan 24, 2017

Hi there and thx for all the work on this badly needed library. Indeed, if you want to use this lib to get information about a lot of processes and/or fields the performance is quite relevant.
Concerning cache. Why not let the user set a "sample stamp" and any cache using a stamp not equal to the current stamp is then invalid.
That way there is no need to invoke Now() for every call to the lib.
The user loop could look like:

for {
  stamp = stamp +1
  gopsutil.setCurrentStamp(stamp)
  for pid in mypids{
    get cpu percent
    get rss
    get swap
    get whatever
  }
  sleep(10s)
}

That way the lib getters (public or internal) only have to compare their cache stamp with the current stamp and invalidate their cache accordingly. No time.Now(), no complex cache invalidation scheme. When you call a getter you fill all the fields you come upon (during the various /proc/pid/* or ps) while looking for the required values. Next calls to other getters will find those values cached until the stamp changes.
Hope I'm making sense.
Cheers.

@PierreF
Copy link
Contributor

PierreF commented Jun 12, 2020

I'm also interested in this feature. As I do something like:

for _, proc := process.Processes() {
  proc.Status()  // This call fillFromStatusWithContext
  proc.NumThreads() // This call fillFromStatusWithContext
}

We means fillFromStatusWithContext is called twice and do twice the work :(

FYI, Python's psutil use a context which enable caching until context is exited: https://psutil.readthedocs.io/en/latest/#psutil.Process.oneshot

I think it a simple solution which avoid any issue with cache invalidation and it solve my use case (and author's use-case if I understand well).

Biggest issue, go don't have such kind of context :)

I'm willing to propose a PR to solve my use-case, be it either:

  • A cache based on time as discussed in previous comment
  • A clone of OneShot from Python's psutil. My idea would be a function OneShot() which enable unlimited "caching" & and ClearOneShot() (suggestion for better name are welcome) to clear that cache and be back to default behavior
  • The SetCurrentStamp solution (but I personally prefer the two other).

@shirou which solution do you want me to implement ?

@Lomanic
Copy link
Collaborator

Lomanic commented Jun 12, 2020

I prefer the solution in #286 (comment), with immutable information, to keep the API simple. It could be a drop-in replacement for the majority of use-cases.

Something like

type ProcessWithFields struct { // better name to be suggested? We could reuse the Process type and do some check to know if it's a snapshot process or not
	Pid            int32 `json:"pid"`
	// [...] implem details
}

type Field uint

const (
    Name Field = iota + 1
    Exe
   // [...]
)
func NewProcessWithFields(pid int32, fields ...Field) (*Process, error) {
    // retrieve data cleverly
}
func ProcessesWithFields(fields ...Field) ([]*Process, error) {
    // retrieve data cleverly for all processes
}

// same functions with WithContext

func (p *ProcessWithFields) Name() (string, error) {
    return p.name nil
}

// same for the other fields

Keep in mind this is a huge task and refactoring (to limit code duplication between current implementation and the "WithFields" one), as process data is not obtained the same way for all the OSes, the implementation and the "quick paths" will be platform-dependent.

I'll be glad to assist you in all this.

@PierreF
Copy link
Contributor

PierreF commented Jun 12, 2020

Ok seems indeed a good API.

I'll start working on this way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants