Skip to content

luddic/pgbadger

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

NAME
    pgBadger - a fast PostgreSQL log analysis report

SYNOPSIS
    pgbadger [options] logfile [...]

            PostgreSQL log analyzer with fully detailed reports and charts.

    Arguments:

        logfile can be a single log file, a list of files, or a shell command
        returning a list of files. If you want to pass log content from stdin
        use - as filename. Note that input from stdin will not work with csvlog.

    Options:

        -a | --average minutes : number of minutes to build the average graphs of
                                 queries and connections.
        -b | --begin datetime  : start date/time for the data to be parsed in log.
        -c | --dbclient host   : only report on entries for the given client host.
        -C | --nocomment       : remove comments like /* ... */ from queries.
        -d | --dbname database : only report on entries for the given database.
        -e | --end datetime    : end date/time for the data to be parsed in log.
        -f | --format logtype  : possible values: syslog,stderr,csv. Default: stderr
        -G | --nograph         : disable graphs on HTML output. Enable by default.
        -h | --help            : show this message and exit.
        -i | --ident name      : programname used as syslog ident. Default: postgres
        -j | --jobs number     : number of jobs to run on parallel on each log file.
                                 Default is 1, run as single process.
        -J | --Jobs number     : number of log file to parse in parallel. Default
                                 is 1, run as single process.
        -l | --last-parsed file: allow incremental log parsing by registering the
                                 last datetime and line parsed. Useful if you want
                                 to watch errors since last run or if you want one
                                 report per day with a log rotated each week.
        -m | --maxlength size  : maximum length of a query, it will be restricted to
                                 the given size. Default: no truncate
        -n | --nohighlight     : disable SQL code highlighting.
        -N | --appname name    : only report on entries for given application name
        -o | --outfile filename: define the filename for output. Default depends on
                                 the output format: out.html, out.txt or out.tsung.
                                 To dump output to stdout use - as filename.
        -p | --prefix string   : give here the value of your custom log_line_prefix
                                 defined in your postgresql.conf. Only use it if you
                                 aren't using one of the standard prefixes specified
                                 in the pgBadger documentation, such as if your prefix
                                 includes additional variables like client ip or
                                 application name. See examples below.
        -P | --no-prettify     : disable SQL queries prettify formatter.
        -q | --quiet           : don't print anything to stdout, not even a progress bar.
        -s | --sample number   : number of query samples to store/display. Default: 3
        -S | --select-only     : use it if you want to report select queries only.
        -t | --top number      : number of queries to store/display. Default: 20
        -T | --title string    : change title of the HTML page report.
        -u | --dbuser username : only report on entries for the given user.
        -U | --exclude-user username : exclude entries for the specified user from report.
        -v | --verbose         : enable verbose or debug mode. Disabled by default.
        -V | --version         : show pgBadger version and exit.
        -w | --watch-mode      : only report errors just like logwatch could do.
        -x | --extension       : output format. Values: text, html or tsung. Default: html
        -z | --zcat exec_path  : set the full path to the zcat program. Use it if
                                 zcat or bzcat or unzip is not on your path.
        --pie-limit num        : pie data lower than num% will show a sum instead.
        --exclude-query regex  : any query matching the given regex will be excluded
                                 from the report. For example: "^(VACUUM|COMMIT)"
                                 You can use this option multiple times.
        --exclude-file filename: path of the file which contains all the regex to use
                                 to exclude queries from the report. One regex per line.
        --include-query regex  : any query that does not match the given regex will be
                                 excluded from the report. For example: "(table_1|table_2)"
                                 You can use this option multiple times.
        --include-file filename: path of the file which contains all the regex of the
                                 queries to include from the report. One regex per line.
        --disable-error        : do not generate error report.
        --disable-hourly       : do not generate hourly report.
        --disable-type         : do not generate query type report.
        --disable-query        : do not generate query reports (slowest, most
                                 frequent, ...).
        --disable-session      : do not generate session report.
        --disable-connection   : do not generate connection report.
        --disable-lock         : do not generate lock report.
        --disable-temporary    : do not generate temporary report.
        --disable-checkpoint   : do not generate checkpoint report.
        --disable-autovacuum   : do not generate autovacuum report.
        --charset              : used to set the HTML charset to be used. Default: utf-8.
        --csv-separator        : used to set the CSV field separator, default: ,
        --exclude-time  regex  : any timestamp matching the given regex will be
                                 excluded from the report. Example: "2013-04-12 .*"
                                 You can use this option multiple times.
        --exclude-appname name : exclude entries for the specified application name
                                 from report. Example: "pg_dump".

    Examples:

            pgbadger /var/log/postgresql.log
            pgbadger /var/log/postgres.log.2.gz /var/log/postgres.log.1.gz /var/log/postgres.log
            pgbadger /var/log/postgresql/postgresql-2012-05-*
            pgbadger --exclude-query="^(COPY|COMMIT)" /var/log/postgresql.log
            pgbadger -b "2012-06-25 10:56:11" -e "2012-06-25 10:59:11" /var/log/postgresql.log
            cat /var/log/postgres.log | pgbadger -
            # log prefix with stderr log output
            perl pgbadger --prefix '%t [%p]: [%l-1] user=%u,db=%d,client=%h' \
                            /pglog/postgresql-2012-08-21*
            perl pgbadger --prefix '%m %u@%d %p %r %a : ' /pglog/postgresql.log
            # Log line prefix with syslog log output
            perl pgbadger --prefix 'user=%u,db=%d,client=%h,appname=%a' \
                            /pglog/postgresql-2012-08-21*

    Use my 8 CPUs to parse my 10GB file faster, really faster

            perl pgbadger -j 8 /pglog/postgresql-9.1-main.log

    Generate Tsung sessions XML file with select queries only:

        perl pgbadger -S -o sessions.tsung --prefix '%t [%p]: [%l-1] user=%u,db=%d ' /pglog/postgresql-9.1.log

    Reporting errors every week by cron job:

        30 23 * * 1 /usr/bin/pgbadger -q -w /var/log/postgresql.log -o /var/reports/pg_errors.html

    Generate report every week using incremental behavior:

        0 4 * * 1 /usr/bin/pgbadger -q `find /var/log/ -mtime -7 -name "postgresql.log*"` \
            -o /var/reports/pg_errors-`date +%F`.html -l /var/reports/pgbadger_incremental_file.dat

    This supposes that your log file and HTML report are also rotated every
    week.

    If you have a pg_dump at 23:00 and 13:00 each day during half an hour,
    you can use pgbadger as follow to exclude these periods from the report:

        pgbadger --exclude-time "2013-09-.* (23|13):.*" postgresql.log

    This will help to not have all COPY order on top of slowest queries. You
    can also use --exclude-appname "pg_dump" to solve this problem in a more
    simple way.

DESCRIPTION
    pgBadger is a PostgreSQL log analyzer build for speed with fully
    detailed reports from your PostgreSQL log file. It's a single and small
    Perl script that outperform any other PostgreSQL log analyzer.

    It is written in pure Perl language and uses a javascript library
    (flotr2) to draw graphs so that you don't need to install any additional
    Perl modules or other packages. Furthermore, this library gives us more
    features such as zooming. pgBadger also uses the Bootstrap javascript
    library and the FontAwesome webfont for better design. Everything is
    embedded.

    pgBadger is able to autodetect your log file format (syslog, stderr or
    csvlog). It is designed to parse huge log files as well as gzip
    compressed file. See a complete list of features below.

    All charts are zoomable and can be saved as PNG images.

    You can also limit pgBadger to only report errors or remove any part of
    the report using command line options.

    pgBadger supports any custom format set into log_line_prefix of your
    postgresql.conf file provide that you use the %t, %p and %l patterns.

    pgBadger allow parallel processing on a single log file and multiple
    files through the use of the -j option and the number of CPUs as value.

    If you want to save system performance you can also use log_duration
    instead of log_min_duration_statement to have reports on duration and
    number of queries only.

FEATURE
    pgBadger reports everything about your SQL queries:

            Overall statistics
            The most frequent waiting queries.
            Queries that waited the most.
            Queries generating the most temporary files.
            Queries generating the largest temporary files.
            The slowest queries.
            Queries that took up the most time.
            The most frequent queries.
            The most frequent errors.

    The following reports are also available with hourly charts divide by
    periods of five minutes:

            SQL queries statistics.
            Temporary file statistics.
            Checkpoints statistics.
            Autovacuum and autoanalyze statistics.

    There's also some pie reports of distribution about:

            Locks statistics.
            ueries by type (select/insert/update/delete).
            Distribution of queries type per database/application
            Sessions per database/user/client.
            Connections per database/user/client.
            Autovacuum and autoanalyze per table.

    All charts are zoomable and can be saved as PNG images. SQL queries
    reported are highlighted and beautified automatically.

REQUIREMENT
    pgBadger comes as a single Perl script - you do not need anything other
    than a modern Perl distribution. Charts are rendered using a Javascript
    library so you don't need anything. Your browser will do all the work.

    If you planned to parse PostgreSQL CSV log files you might need some
    Perl Modules:

            Text::CSV_XS - to parse PostgreSQL CSV log files.

    This module is optional, if you don't have PostgreSQL log in the CSV
    format you don't need to install it.

    Compressed log file format is autodetected from the file exension. If
    pgBadger find a gz extension it will use the zcat utility, with a bz2
    extension it will use bzcat and if the file extension is zip then the
    unzip utility will be used.

    If those utilities are not found in the PATH environment variable then
    use the --zcat command line option to change this path. For example:

            --zcat="/usr/local/bin/gunzip -c" or --zcat="/usr/local/bin/bzip2 -dc"
            --zcat="C:\tools\unzip -p"

    By default pgBadger will use the zcat, bzcat and unzip utilities
    following the file extension. If you use the default autodetection
    compress format you can mixed gz, bz2 or zip files. Specifying a custom
    value to --zcat option will remove this feature of mixed compressed
    format.

    Note that multiprocessing can not be used with compressed files or CSV
    files as well as under Windows platform.

POSTGRESQL CONFIGURATION
    You must enable and set some configuration directives in your
    postgresql.conf before starting.

    You must first enable SQL query logging to have something to parse:

            log_min_duration_statement = 0

    Here every statement will be logged, on busy server you may want to
    increase this value to only log queries with a higher duration time.
    Note that if you have log_statement set to 'all' nothing will be logged
    through log_min_duration_statement. See next chapter for more
    information.

    With 'stderr' log format, log_line_prefix must be at least:

            log_line_prefix = '%t [%p]: [%l-1] '

    Log line prefix could add user and database name as follows:

            log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d '

    or for syslog log file format:

            log_line_prefix = 'user=%u,db=%d '

    Log line prefix for stderr output could also be:

            log_line_prefix = '%t [%p]: [%l-1] db=%d,user=%u '

    or for syslog output:

            log_line_prefix = 'db=%d,user=%u '

    You need to enable other parameters in postgresql.conf to get more
    information from your log files:

            log_checkpoints = on
            log_connections = on
            log_disconnections = on
            log_lock_waits = on
            log_temp_files = 0

    Do not enable log_statement as their log format will not be parsed by
    pgBadger.

    Of course your log messages should be in English without locale support:

            lc_messages='C'

    but this is not only recommended by pgBadger.

log_min_duration_statement, log_duration and log_statement
    If you want full statistics reports you must set
    log_min_duration_statement to 0 or more milliseconds.

    If you just want to report duration and number of queries and don't want
    all details about queries, set log_min_duration_statement to -1 to
    disable it and enable log_duration in your postgresql.conf file. If you
    want to add the most common request report you can either choose to set
    log_min_duration_statement to a higher value or choose to enable
    log_statement.

    Enabling log_min_duration_statement will add reports about slowest
    queries and queries that took up the most time. Take care that if you
    have log_statement set to 'all' nothing will be logged with
    log_line_prefix.

Parallel processing
    To enable parallel processing you just have to use the -j N option where
    N is the number of cores you want to use.

    pgbadger will then proceed as follow:

            for each log file
                chunk size = int(file size / N)
                look at start/end offsets of these chunks
                fork N processes and seek to the start offset of each chunk
                    each process will terminate when the parser reach the end offset
                    of its chunk
                    each process write stats into a binary temporary file
               wait for all children has terminated
            All binary temporary files generated will then be read and loaded into
            memory to build the html output.

    With that method, at start/end of chunks pgbadger may truncate or omit a
    maximum of N queries perl log file which is an insignificant gap if you
    have millions of queries in your log file. The chance that the query
    that you were looking for is loose is near 0, this is why I think this
    gap is livable. Most of the time the query is counted twice but
    truncated.

    When you have lot of small log files and lot of CPUs it is speedier to
    dedicate one core to one log file at a time. To enable this behavior you
    have to use option -J N instead. With 200 log files of 10MB each the use
    of the -J option start being really interesting with 8 Cores. Using this
    method you will be sure to not loose any queries in the reports.

    He are a benchmarck done on a server with 8 CPUs and a single file of
    9.5GB.

             Option |  1 CPU  | 2 CPU | 4 CPU | 8 CPU
            --------+---------+-------+-------+------
               -j   | 1h41m18 | 50m25 | 25m39 | 15m58
               -J   | 1h41m18 | 54m28 | 41m16 | 34m45

    With 200 log files of 10MB each and a total og 2GB the results are
    slightly different:

             Option | 1 CPU | 2 CPU | 4 CPU | 8 CPU
            --------+-------+-------+-------+------
               -j   | 20m15 |  9m56 |  5m20 | 4m20
               -J   | 20m15 |  9m49 |  5m00 | 2m40

    So it is recommanded to use -j unless you have hundred of small log file
    and can use at least 8 CPUs.

    IMPORTANT: when you are using parallel parsing pgbadger will generate a
    lot of temporary files in the /tmp directory and will remove them at
    end, so do not remove those files unless pgbadger is not running. They
    are all named with the following template tmp_pgbadgerXXXX.bin so they
    can be easily identified.

INSTALLATION
    Download the tarball from github and unpack the archive as follow:

            tar xzf pgbadger-4.x.tar.gz
            cd pgbadger-4.x/
            perl Makefile.PL
            make && sudo make install

    This will copy the Perl script pgbadger to /usr/local/bin/pgbadger by
    default and the man page into /usr/local/share/man/man1/pgbadger.1.
    Those are the default installation directories for 'site' install.

    If you want to install all under /usr/ location, use INSTALLDIRS='perl'
    as an argument of Makefile.PL. The script will be installed into
    /usr/bin/pgbadger and the manpage into /usr/share/man/man1/pgbadger.1.

    For example, to install everything just like Debian does, proceed as
    follows:

            perl Makefile.PL INSTALLDIRS=vendor

    By default INSTALLDIRS is set to site.

AUTHORS
    pgBadger is an original work from Gilles Darold.

    The pgBadger logo is an original creation of Damien Clochard.

    The pgBadger v4.x design comes from the "Art is code" company.

    This web site is a work of Gilles Darold.

    pgBadger is maintained by Gilles Darold and the good folks at Dalibo
    every one who wants to contribute.

    Many people have contributed to pgBadger, they are all quoted in the
    Changelog file.

LICENSE
    pgBadger is free software distributed under the PostgreSQL Licence.

    Copyright (c) 2012-2013, Dalibo

    A modified version of the SQL::Beautify Perl Module is embedded in
    pgBadger with copyright (C) 2009 by Jonas Kramer and is published under
    the terms of the Artistic License 2.0.

About

a fast PostgreSQL Log Analyzer

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Perl 100.0%