Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Need document of how to rotate logs in Windows #3393

Closed
danielnelson opened this issue Oct 26, 2017 · 17 comments
Closed

Need document of how to rotate logs in Windows #3393

danielnelson opened this issue Oct 26, 2017 · 17 comments
Labels
area/logging docs Issues related to Telegraf documentation and configuration descriptions platform/windows
Milestone

Comments

@danielnelson
Copy link
Contributor

Bug report

In Windows it is not clear how to handle log rotation. Normally on unix systems this is left to another tool such as logrotate.

One idea is to use the tools mentioned here:
https://jira.mongodb.org/browse/SERVER-7312

System info:

Telegraf v1.4.3

Windows

@danielnelson danielnelson added the docs Issues related to Telegraf documentation and configuration descriptions label Oct 26, 2017
@Lars-Diehl
Copy link

I don't think that an external tool will work at all.
I have telegraf for Windows installed as Service. The log file cannot be renamed as log as the Service is active.
Anyway using external log Rotation Tools is not state of the art for Windows. I would recommed to implement built-in log Rotation.

@russorat
Copy link
Contributor

russorat commented Apr 4, 2018

@Lars-Diehl @danielnelson what are some ways users work around this today?

@Lars-Diehl
Copy link

Lars-Diehl commented Apr 5, 2018

So far we have not much experience. We are currently in the process of evaluating telegraf for windows and we are preparing integration into our provisioning process.

The only option that I can envision is to terminate the telegraf service prior to a log switch. We could do this in a dedicated script that needs to set as windows scheduled task. The script could terminate the service, rotate the log, start the service. However this is a lot of unnecessary work on our side.

Best way is to implement built-in log rotation.

Addendum:
BTW: We are currently using telegraf Version 1.5.2

@srclosson
Copy link

+1 for a solution here. I am starting to use telegraf heavily, and the lack of log rotation means I have to turn on the quiet option and miss possible events when running in production. Some ideas would be to use the windows event log when running on windows, or perhaps replace logging with a 3rd party logger that supports rotation.

I'm certainly willing to help with this effort.

@danielnelson
Copy link
Contributor Author

We could use https://godoc.org/golang.org/x/sys/windows/svc/eventlog to write to the Event Log. I think we would just need to hook it up in logger/logger.go. We can use a special agent logfile name to select it, maybe:

[agent]
  logfile = "eventlog://hostname/source"

Would you be able to investigate this @srclosson?

@srclosson
Copy link

Yes. Let me investigate.

@srclosson
Copy link

srclosson commented Apr 29, 2018

So I've been thinking about this @danielnelson ...

I'm not sure the event log is the right solution either. If there is a 3rd party tool that rotates logs on linux, perhaps it would make sense to have a similar solution on windows?

On know in my case, I have the requirement of offloading logs in small portions, so a rolling file that get's written to under a temporary extension (ex: .tmp) makes sense. When .tmp get's filled to either a maximum byte size, or a maximum time size (say 30 seconds), the file is closed and renamed for offloading. This is a strategy I commonly use, and it's quite effective, especially on low bandwidth networks.

I expect that many users of telegraf will have equally strict logging contraints, and with this in mind, perhaps it makes sense to have logging, or some parts of the logging pluggable? Or perhaps your plugin could override some functionality relevant to the particular use case? Much like a parser to an input plugin or a serializer to an output plugin. Perhaps there could be a "logging format" and that logging format could be anything from the windows event log, to a rolling file with a temporary extension to an sql lite database...

Thoughts?

@Lars-Diehl
Copy link

Here are my "two cents":

  1. Windows - unlike Unix/Linux - does not have the notion of external log rotation. There is no build-in tool for this and windows applications typically implement their own log rotation. As I stated above - Telegraf opens the log file exclusively and does not even allow to rename or move it as long as the service is active.

  2. In case the Windows Event Log is meant to be monitored by systems managements utilities, the usage of a message catalog is highly recommended. This approach is not portable and is also not an option IMHO.

  3. Providing a plug-in for logging requires a basic logging implementation in the framework that can then be "subclassed" or replaced by a plugin that writes to some other data sink (like an SQL database - which sounds like a weird idea anyway). If anyone wants to store/redirect log contents one can use solutions like logstash, etc...

Due to the fact that there is not suitable way to limit Telegraf's log file size, we cannot roll-out it on the Servers in our datacenter. I would vote for a simple built-in log rotation (based on file size and age) that can be optionally enabled for Windows.

@gregvolk
Copy link
Contributor

gregvolk commented Jan 9, 2019

I would vote for a simple built-in log rotation (based on file size
and age) that can be optionally enabled for Windows.

+1 simplicity
+1 again for avoiding the Windows Event Log and all of its bloat, difficult parsing, and non-portability

Does the proposed rotation even need to be smart about size and age? Since most shops tend to reboot Windows systems on a monthly basis due to patching even something as simple as the following should take care of nearly all of the log rotation needs:

During windows telegraf.exe startup, prior to open the log file for writing....

  • if(exists $logfile.old) { delete $logfile.old }
  • if(exists $logfile) { rename $logfile $logfile.old; $rotatedlogs = 1 }
  • open $logfile for writing
  • if($rotatedlogs) { log "performed log rotation (renamed $logfile to $logfile.old)" }
  • continue executing

The only case I can think of where the above steps would be insufficient is if someone left a verbose debug on and the log file grew to consume the system's free space prior to a reboot. This is a risk anytime anyone enables debugging on any program and is why sysadmins need to be careful with enabling verbose logging.

@danielnelson
Copy link
Contributor Author

If we do end up supporting the eventlog it would definitely have to be optional. It sounds like this type of log rotation could probably be added outside of Telegraf in a wrapper script?

@gregvolk
Copy link
Contributor

It sounds like this type of log rotation could probably be added
outside of Telegraf in a wrapper script?

This is what I'm planning on doing in the short term. As part of my telegraf installation scripts I'll add a scheduled task that will execute once a month, stop the telegraf service, do a log rotation, and start telegraf.

@javicrespo
Copy link
Contributor

I'm also interested in this feature as we're currently using Telegraf from Windows...

As a reference. Consul provides log rotation capabilities =>https://www.consul.io/docs/agent/options.html#_log_rotate_bytes

Log rotation is optional, so no impact to Linux/logrotate users.

The consul log rotation implementation is currently lacking MaxArchives functionality (remove old files), and that's a gap I'm trying to fill here => https://github.com/hashicorp/consul/pull/5577/files

@danielnelson I'd be happy to work on it if you think it'd be useful!

@gregvolk
Copy link
Contributor

I know I would still like to see built-in log rotation.

In the interim, since I needed to put Telegraf in production on many Windows servers I ended up adding a monthly scheduled task as part of my installation script. The task start time is randomized during install to avoid having a whole bunch of servers all rotate at the same time which could cause a resource shortage on shared infrastructure (like a VMhost).

This is my install batch file (after the files are copied into c:\program files\telegraf)...

REM install-telegraf.bat
REM
REM After the telegraf application files have been put in
REM c:\program files\telegraf, execute the following commands to
REM install the service, start the service, and add a scheduled
REM task with a randomized start time for log rotation
REM
REM install the telegraf service
c:\progra~1\telegraf\telegraf.exe --service install --config "C:\Program Files\Telegraf\telegraf.conf"

REM start the telegraf service after install (it will auto start on reboot)
c:\windows\system32\sc.exe start telegraf

REM Get a random hour and random minute for use with schtasks
set /a H=%random% * (21 - 11 + 1) / 32768 + 11
set /a M=%random% * (59 - 11 + 1) / 32768 + 11

REM add a monthly scheduled task with randomized start time for log rotation
c:\windows\system32\schtasks.exe /Create /RU SYSTEM /SC MONTHLY /MO first /D SUN /ST %H%:%M% /TN rotate-telegraf-log /TR "c:\Program Files\Telegraf\rotate-telegraf-log.bat"

This is the rotation batch file that the scheduled task points to...

REM rotate-telegraf-log.bat
REM stop the telegraf service, rotate the logs, start the telegraf service
REM

c:\windows\system32\net stop telegraf
move /y "C:\Program Files\Telegraf\telegraf.1.log" "c:\Program Files\Telegraf\telegraf.2.log"
move /y "C:\Program Files\Telegraf\telegraf.0.log" "c:\Program Files\Telegraf\telegraf.1.log"
move /y "C:\Program Files\Telegraf\telegraf.log" "c:\Program Files\Telegraf\telegraf.0.log"
c:\windows\system32\net start telegraf

And this is the uninstall that cleans everything up if Telegraf is removed...

REM uninstall-telegraf.bat
REM
REM This batch file will...
REM   Stop the telegraf service
REM   Uninstall the telegraf service
REM   Delete the rotate-telegraf-log scheduled task
REM

REM stop the telegraf service
c:\windows\system32\net stop telegraf

REM uninstall the telegraf service
c:\progra~1\telegraf\telegraf.exe --service uninstall

REM delete monthly scheduled task with randomized start time for log rotation
c:\windows\system32\schtasks.exe /Delete /TN rotate-telegraf-log /f

@danielnelson
Copy link
Contributor Author

@javicrespo Yes, would definitely appreciate some help here. We could add some new options to the [agent] section of the config. It would also be good if we could somehow unify this with #5547, so that both types of rotation have a similar feel with the configuration options, though I imagine separate implementations. Maybe it would make sense to start by looking at this pull request?

@gregvolk Thanks for sharing your workaround, hopefully won't have to go through all of that forever.

@srclosson
Copy link

Although I've not tried this myself, I believe NSSM supports log rotation, so if you were to log to stdout, and use NSSM to capture the logs, I believe NSSM has some more advanced logging capabilities. I think you would have to pass in the -console option though and one would have to test to be sure.

The feature is documented here

This was referenced Apr 28, 2019
@javicrespo
Copy link
Contributor

Implemented here: #5778

@danielnelson
Copy link
Contributor Author

Closed in #5578.

New rotation options are added to the agent section of the config.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/logging docs Issues related to Telegraf documentation and configuration descriptions platform/windows
Projects
None yet
Development

No branches or pull requests

6 participants