Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing thread, process + thread id naming DEC+HEX #82

Closed
vbohata opened this issue Aug 14, 2018 · 6 comments
Closed

Missing thread, process + thread id naming DEC+HEX #82

vbohata opened this issue Aug 14, 2018 · 6 comments

Comments

@vbohata
Copy link

vbohata commented Aug 14, 2018

For process information, there are process fields, for thread there is none. I propose add thread fields and reflect that sometimes the application logs process ID and thread ID in HEX, sometimes in DEC. So maybe there should be process.pid_hex and thread.tid_hex like fields for this case. Even if it is possible to convert to DEC in Logstash, usually we require to know the pid in HEX format and in the same time pid in DEC (as listed in process list in OS).
Also for process.pid, should not be there process.id instead (as pid is shortcut for process id itself)? For process.ppid there could be process.parent.id.

@webmat
Copy link
Contributor

webmat commented Aug 14, 2018

I like the idea of adding thread as well. Good catch.

Out of curiosity, where do you see process and thread IDs in HEX? If you need conversion to DEC, your pipeline get that done for you. But the question is, do you still need the HEX afterwards, or is it simply an idiosyncrasy of this event source?

My preference would be to store only one of them, not both.

On the usage side, if you still need to view the HEX value after ingestion, could you live with a scripted field instead? (e.g. no aggregations on HEX, just on DEC).

@vbohata
Copy link
Author

vbohata commented Aug 14, 2018

For example (not only) MS SharePoint logs process id and thread id in hex. Our users used to use this value. But for correlation with OS processes I need it in DEC. I think I should keep both values. Scripted field is not good as I need to find some value between tens of gigabytes of events.

I did some research and found that linux ps command can show both of these values - pid for process id in DEC and xpid for process id in HEX. So here can be the same naming convention.

So finally following fields seems to be appropriate:
process.pid (or process.id)
process.xpid or (process.xid, but xpid looks better, so for convention pid, xpid, ppid and xppid is probably better)
process.ppid (or process.parent.id)
process.xppid (or process.parent.xid)
thread.tid
thread.xtid
thread.name

I know linux or IBM's ps use sid for "session id" (and xsid) but here thread is more general as usually I need to know info about it from the application side where I know its tid and name. So thats why to use separate thread field.

@ruflin
Copy link
Contributor

ruflin commented Aug 15, 2018

+1 on adding a thread id. Let's focus here on which fields we should add and in a separate issue about renaming potential existing fields.

Should it be process.thread.tid, meaning thread should be inside the process object?

@vbohata
Copy link
Author

vbohata commented Aug 15, 2018

Yes, could be inside process. It is highly related/dependent/part of process, currently can not imagine use case for using thread outside of process context.

@webmat
Copy link
Contributor

webmat commented Aug 17, 2018

Ok I was not aware of this use case. I'm in favor of adding the hex representation fields, then.

We're trying to simplify names (process.id, host.name), but in some cases some names are very deeply ingrained all over the culture (pid, hostname). I think PID is one of those cases. I was not really aware of xpid, but I think I would stick with the consistency of the whole bunch of the field names:

process.pid
process.xpid
process.ppid
process.xppid
process.thread.tid
process.thread.xtid

However ECS is just a schema, so it will be the responsibility of the user's pipelines to either fill in the DEC value or the HEX value as needed, at the appropriate time in their pipeline.

@webmat webmat mentioned this issue Sep 18, 2018
26 tasks
@ruflin ruflin mentioned this issue Oct 31, 2018
22 tasks
@webmat
Copy link
Contributor

webmat commented Dec 3, 2018

We've added process.thread.id recently.

We have however decided not to add fields for hex IDs. Users of ECS are free to add additional custom fields to their index, and you should feel free to do so. The chances of conflicts are really low, since we don't plan to add these fields.

I'm aware that we're not yet addressing the thread name mentioned above (nor thread priority, which would also make sense). But the creation of object process.thread leaves space for those eventually. This will likely happen post 1.0, however.

I will close this issue for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants