-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for network volumes in Filebeat #5876
Comments
I guess this is happening because ZFS sets a new device ID when mounting the partition? |
Each volume gets a device ID and they're first come first served, device IDs being assigned dynamically by the running system without any persistence. The implementation here is linear (1,2,3,4,...), But there's really no particular requirement for that, and absolutely nothing requires persistence across boots. |
I should probably also note that you're only recording the minor id, but even on a running system that doesn't define a device, you need major+minor to uniquely identify a device on a running system. |
I think this is an issue specific to ZFS and not a general "doing it wrong" ;-) So far for most file system it worked really well and the combination of device + inode is the unique identifer of a file on the other file systems (windows has 3 identifiers). The issue you are describing reminds me a lot of some "interesting" behaviour on shared file systems and is the reason we recommend to install filebeat on the edge node. But for ZFS I think this can't be applied. The main question is which methods we have to identify a file over it's life time:
All have their pros and cons. Currently we do option 2 with the most common identifiers. We also discussed option 1 and 3 in the past. 1 would work really well in cases where files are not moved / rotated / renamed and the file name is the unique identifier. Option 3 we discussed for cases where unique identifiers from 2 do not stay the same like in shared volumes. We would identify a file based on hash of a subset of the content of the file. One additional option you brought up above is to take 2 but only enforce a subset of identifiers. I seems for your case only option 3 would work as the path of the same file changes over time? |
@exekias I removed the bug label and changed it to enhancement as the above behaviour is from my point of view as expected and by design. I was not aware that ZFS behaves like the above, so we should probably add a note to our docs about this. |
This isn't limited to zfs. I just tested w/ GCE, I had a /dev/sda1 (/), I attached a disk (which provided /dev/sdb1) and mounted it as /media, then I unmounted it, created a new disk, attached it, partitioned+formated it, mounted it as /media (it became /dev/sdb1 -- and had the same major/minor as the previous /dev/sdb1), and then I reattached the previous disk (it became /dev/sdc1) which I mounted as /mnt -- this time it had a new minor number (because the newer disk was sitting in the minor slot ...). Really, anything that involves any amount of dynamism is problematic. [It would probably apply to nfs, samba, afs, fuse, but you'd wave that off...] |
Thanks a lot for putting all this effort in and test the different system. It's definitively a problem we need to start tackling more active with filebeat. I think the most important step on our side is to make it pluggable how files are compared so any of the 3 options mentioned above could be used and more could be added. First steps have been made but we are not there yet. You are definitively right with the other network systems you mentioned. We are aware of this limitation, see https://www.elastic.co/guide/en/beats/filebeat/current/faq.html#filebeat-network-volumes Let me quickly explain why I replaced the @jsoref I suggest to rename the title to something like "Add support for network volumes in Filebeat" or "Add additional file identification mechanisms to Filebeat" to be more explicity on what we need to add. |
I don't have a particularly good sense of the story for inodes. https://lists.freebsd.org/pipermail/freebsd-hackers/2010-February/030746.html seems to indicate that inodes are figments created as requested. https://github.com/zfsonlinux/zfs is the project that handles the Linux kernel implementation. Afaict, these things are probably moderately reliable until a computer reboots or a device detaches, and entirely unreliable after either of those events occurs. Again, the algorithm I suggested of only relying on these pieces of information up to the point where the system has rebooted should work for most cases. I don't have any advice for how to deal w/ the case where physical devices come/go while a system is running. And, fwiw that's probably going to happen much more often. (This morning we hot swapped a disk on a physical server because it failed. This afternoon I started talking about plans for various migrations between systems, some models could involve ejecting disks and moving them to other systems.) |
@ruflin Any update on this? We face the same issue with a nfs volumes mounted in a Kubernetes pod. |
@bquartier It still something we are planning to do to improve our story on shared FS, we haven't started working on it yet. |
@bquartier: thanks for the note (we're considering playing w/ kubernetes, so you've saved me a check...) |
@ruflin Any updates on this enhancement? |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
that's very unhelpful. |
Pinging @elastic/agent (Team:Agent) |
Hi! We're labeling this issue as |
I see no evidence that anything has actually improved and would rather a human point to actual progress. |
Hi! We're labeling this issue as |
I see no evidence that anything has actually improved and would rather a human point to actual progress. |
Hi! We're labeling this issue as |
I see no evidence that anything has actually improved and would rather a human point to actual progress. |
compression=lz4
) w/ lots of file systems and occasionally create new ones/path/to/logs/logfile-*.txt
include_lines:
designed to capture a tiny subset of the log file content/path/to/logs/logfile-STUFF-YYYY-MM-DD.txt
zpool
(usingzfs create pool/...
)Here's some python I used to look at the registry (because I was curious):
Essentially, at some point in time my log files were on a volume which was assigned deviceid
83
. At some point the system rebooted, and now the deviceid for the same volume is91
. After that point, the system ran for a bit over a day and 80 new files (~75 from the second day) appeared.The files I care most about are the ones in offset_check -- they're all >4GB, and filebeat had them all open. I believe it was slowly making progress through them, but it was doing a really bad job of it.
My understanding is that device ids are not guaranteed past reboot, and any process trying to use them past that point is "doing it wrong". I believe that filebeat is in this category.
Expected result (conceptually):
I don't have any particular opinion about what one should do if a volume is unmounted and remounted while filebeat is running. Offhand, I think that if the device id changes you probably can't rely on the inode either, but personally, I'd expect the process to consider the datestamp of the file -- if the file hasn't changed since the last time it was seen, then it should normally be treated as the same file. If the file has a different inode and the same deviceid as the last time something looked, then it's reasonable to think it may have changed.
The text was updated successfully, but these errors were encountered: