Quickstart

First, complete the :ref:`installation` process.

Wikipedia Traffic

In this example we'll compute some basic statistics on a day's worth of Wikipedia page view data. Each record in the dataset contains hourly page view statistics for every Wikipedia page. The record format is as follows:

hour | project | page title | view count | bytes served

First, let's create our continuous view using psql:

psql -c "
CREATE FOREIGN TABLE wiki_stream (
        hour timestamp,
        project text,
        title text,
        view_count bigint,
        size bigint)
SERVER pipelinedb;
CREATE VIEW wiki_stats WITH (action=materialize) AS
SELECT hour, project,
        count(*) AS total_pages,
        sum(view_count) AS total_views,
        min(view_count) AS min_views,
        max(view_count) AS max_views,
        avg(view_count) AS avg_views,
        percentile_cont(0.99) WITHIN GROUP (ORDER BY view_count) AS p99_views,
        sum(size) AS total_bytes_served
FROM wiki_stream
GROUP BY hour, project;"

Now we'll decompress the dataset as a stream and write it to stdin, which can be used as an input to COPY:

curl -sL http://pipelinedb.com/data/wiki-pagecounts | gunzip | \
        psql -c "
        COPY wiki_stream (hour, project, title, view_count, size) FROM STDIN"

Note that this dataset is large, so the above command will run for quite a while (cancel it whenever you'd like). As it's running, select from the continuous view as it ingests data from the input stream:

psql -c "
SELECT * FROM wiki_stats ORDER BY total_views DESC";

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

quickstart.rst

quickstart.rst

Quickstart

Wikipedia Traffic

Files

quickstart.rst

Latest commit

History

quickstart.rst

File metadata and controls

Quickstart

Wikipedia Traffic