Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Filebeat only checks for closed or renamed files when reaching EOF #5556

Closed
ZhiqinYang opened this issue Nov 11, 2017 · 25 comments
Closed

Filebeat only checks for closed or renamed files when reaching EOF #5556

ZhiqinYang opened this issue Nov 11, 2017 · 25 comments
Labels

Comments

@ZhiqinYang
Copy link

ZhiqinYang commented Nov 11, 2017

https://github.com/elastic/beats/blob/master/filebeat/prospector/log/log.go#line49

func (f *Log) Read(buf []byte) (int, error) {
	totalN := 0

	for {
		select {
		case <-f.done:
			return 0, ErrClosed
		default:
		}

		n, err := f.fs.Read(buf)
		if n > 0 {
			f.offset += int64(n)
			f.lastTimeRead = time.Now()
		}
		totalN += n

		// Read from source completed without error
		// Either end reached or buffer full
		if err == nil {
			// reset backoff for next read
			f.backoff = f.config.Backoff
			return totalN, nil
		}

		// Move buffer forward for next read
		buf = buf[n:]

		// Checks if an error happened or buffer is full
		// If buffer is full, cannot continue reading.
		// Can happen if n == bufferSize + io.EOF error
		err = f.errorChecks(err)
		if err != nil || len(buf) == 0 {
			return totalN, err
		}

		logp.Debug("harvester", "End of file reached: %s; Backoff now.", f.fs.Name())
		f.wait()
	}
}

why f.errorChecks method not follow n, err := f.fs.Read(buf) ? if file is ready and name is changed or remove how it know the action? because err == nil you return it back .

why not do as follow:
n, err := f.fs.Read(buf)
err = f.errorChecks(err)

@ruflin
Copy link
Contributor

ruflin commented Nov 13, 2017

If you have a look at checkError you will see that it creates some new errors in certain cases and set other errors to nil. So I think the above is correct as it is.

Perhaps you can share some more background on the issue you see and what you worried about?

Please also open an issue for that on discuss so we can move to Github in case we can confirm it's a bug or a feature request. Closing this, happy to follow further on discuss.

@ruflin ruflin closed this as completed Nov 13, 2017
@ZhiqinYang
Copy link
Author

ZhiqinYang commented Nov 13, 2017

@ruflin
sorry I mean if a filename changed it could not know in time, because the file in reading and not reach eof!

        n, err := f.fs.Read(buf)
	if n > 0 {
		f.offset += int64(n)
		f.lastTimeRead = time.Now()
	}
	totalN += n
	// Read from source completed without error
	// Either end reached or buffer full
	if err == nil {
		// reset backoff for next read
		f.backoff = f.config.Backoff
		return totalN, nil
	}

err == nil function returned , but if filename changed it can't know!

@ruflin
Copy link
Contributor

ruflin commented Nov 14, 2017

Knowing about the renaming of files is "best effort". File are harvester by file handler and not based on the file name. So if a file name changes it's not required for filebeat to know about it immidiately as the file handler stays the same.

Can you share some more on your use case and where the above is a problem?

@ZhiqinYang
Copy link
Author

@ruflin
I run docker daemon with this --log-opt max-size=1024m --log-opt max-file=2, an application produce logs with 10w/s when the log files is removed or name changed by docker, but filebeat remain read it and the inode not released!

@ruflin
Copy link
Contributor

ruflin commented Nov 15, 2017

If you want to stop reading the file when it's removed / renamed, you should use close_renamed and close_removed. There is a small delay which I think from my perspective is acceptable.

@ZhiqinYang
Copy link
Author

I did this, but I found it execute in errorChecks if a file is so big that not read the end ! how can I give up it?

@ruflin
Copy link
Contributor

ruflin commented Nov 20, 2017

The error check is done for each line it reads, so as soon as the next line is read, it should detect the change.

Perhaps I miss something here?

@ZhiqinYang
Copy link
Author

I'm sorry. If a file is very very large and filebeat is reading it at the same moment. If suddenly the file name was changed or the file was deleted at that time, For example, when the code run at 59: n, err :f.fs.Read(buf), if 68: if err == nil, and it returned right now. The code 80: err: f.errorCheck(err) will not executed. how to make the field close_renamed and close_removed effect?

@ruflin
Copy link
Contributor

ruflin commented Nov 26, 2017

@ZhiqinYang Thanks a lot for bearing with me on this one. I know finally understand what you are getting at. In case of a successful read, it's never checked if the file was renamed or closed only at the end of the file.

TBH I don't think it's intentional that we have it this way and it is not the expected behaviour. On interesting property of having it the way it is, is that f.fs.Stat() is not called for every line it reads which could cause quite a bit of overhead I think. So we should be careful changing this.

I'm reopening this as an issue and will change the title to be more fitting.

@ruflin ruflin reopened this Nov 26, 2017
@ruflin ruflin changed the title Is this a bug ? I can't confirm Filebeat only checks for closed or renamed files when reaching EOF Nov 26, 2017
@ruflin ruflin added bug Filebeat Filebeat labels Nov 26, 2017
@ZhiqinYang
Copy link
Author

ZhiqinYang commented Nov 27, 2017

@ruflin Thank you!
I think this is very useful if a file very very large, this may lead disk use up because close_renamed and close_removed no effect, the inode also underused with filebeat!

@KeymanHuang
Copy link

KeymanHuang commented Jan 25, 2018

@ZhiqinYang Have u resolved it? I also encountered a file removed, but the file handler is always opened. close_removed and clean_removed are true.

@ruflin
Copy link
Contributor

ruflin commented Jan 25, 2018

@GaryHuangBD There is no fix for this yet as it requires quite a bit of change in the logic that close_* options work. I assume in your case you also have a very large file?

@KeymanHuang
Copy link

KeymanHuang commented Jan 26, 2018

@ruflin Actually, I have dozens of files to collect at the same time,and the size of each file is about ten Gb. I have to tell u a strange thing, filebeat.harvester.open_files is correct, but lsof shows that filebeat opens a lot of deleted files.
filebeat log:

INFO File was removed: /data/app/taf/app_log/********/IPRiskServer/IPRiskServer_RISKIP_20180124.log. Closing because close_removed is enabled.

lsof always shows:

 filebeat **************************** 
  /data/app/taf/app_log/***/IPRiskServer/IPRiskServer_RISKIP_20180124.log (deleted)

One day later, it still like this, so i have to kill filebeat.

Do you have any suggestions to avoid this problem?

@ruflin
Copy link
Contributor

ruflin commented Jan 28, 2018

@GaryHuangBD I think your best temporary fix here is to use close_timeout.

@ZhiqinYang
Copy link
Author

ZhiqinYang commented Jan 31, 2018

i did like follow, i don't think it's a good way !

`
func (f *Log) checkFileStatus() error {
// Refetch fileinfo to check if the file was truncated or disappeared.
// Errors if the file was removed/rotated after reading and before
// calling the stat function

 info, statErr := f.fs.Stat()
 if statErr != nil {
    logp.Err("Unexpected error reading from %s; error: %s", f.fs.Name(), statErr)
     return statErr
 }

// check if file was truncated
 if info.Size() < f.offset {
     logp.Debug("harvester",
         "File was truncated as offset (%d) > size (%d): %s", f.offset, info.Size(), f.fs.Name())
     return ErrFileTruncate
 }

 // Check file wasn't read for longer then CloseInactive
 age := time.Since(f.lastTimeRead)
 if age > f.config.CloseInactive {
     return ErrInactive
 }

 if f.config.CloseRenamed {
    // Check if the file can still be found under the same path
     if !file.IsSameFile(f.fs.Name(), info) {
       return ErrRenamed
     }
 }

if f.config.CloseRemoved {
     // Check if the file name exists. See https://github.com/elastic/filebeat/issues/93
     _, statErr := os.Stat(f.fs.Name())

     // Error means file does not exist.
     if statErr != nil {
         return ErrRemoved
     }
 }
 return nil

}

func (f *Log) Read(buf []byte) (int, error) {

 totalN := 0
 for {
    select {
     case <-f.done:
       return 0, ErrClosed
    default:
     }
     // 防止数据过大inode 耗尽
     if err := f.checkFileStatus(); err != nil {
         return totalN, err
     }

    n, err := f.fs.Read(buf)
    if n > 0 {
         f.offset += int64(n)
         f.lastTimeRead = time.Now()
    }
    totalN += n

...

`

@ruflin
Copy link
Contributor

ruflin commented Feb 6, 2018

@ZhiqinYang The long term fix I have in mind is to change the way the close_* options are checked. Instead of checking them only on each iteration, there is also an external go routine running which checks from time to time if one of the close options is met and closes the handler. Very similar to what we do with close_timeout at the moment but would apply to all close options.

@KeymanHuang
Copy link

KeymanHuang commented Feb 8, 2018

@ruflin It's the same as I thought. I add another goroutine to check close_* options periodically, and it looks normal now.

@ruflin
Copy link
Contributor

ruflin commented Feb 8, 2018

@GaryHuangBD Would be great if you could share your code. Depending on the change you did, make sure to use Go 1.9 as there was a change in Go 1.9 with the closing file handlers which lead to races before but now should work.

@KeymanHuang
Copy link

@ruflin Yeah~ i think i can do it after the Spring Festival, if it's still needed.

@ruflin
Copy link
Contributor

ruflin commented Feb 12, 2018

Thanks, would be great.

@pathcl
Copy link

pathcl commented Aug 7, 2018

@ruflin Hi ! Im facing the same situation now. Is this fixed by now? so far we have to restart filebeat process in order to release disk usage.

@kvch
Copy link
Contributor

kvch commented Aug 8, 2018

Right now there is a big refactoring going on in the log input. This part of the code is not yet touched. But any feedback and problems are welcome to make sure as many problems as possible can be resolved.

@ttapjinda
Copy link

Hi, do we still need this feature? I have submitted the pull request here #8627
check_status_interval have been added to control the frequency of status checking instead of checking when on read error or end of file.
The default behavior will still be the same since this can cause data loss.

@d-ulyanov
Copy link

Any updates on this issue?

@jsoriano
Copy link
Member

This issue should be fixed with #13907, it is already available in some released versions. Closing this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

8 participants