This project will query a PostgreSQL database (called "news") located on a virtual machine using a python program in order to answer three questions.
- What are the most popular three articles of all time?
- Who are the most popular authors of all time?
- On which days did more than 1% of requests lead to errors?
News is a fictional news website. It contains three tables - articles, authors, and log
. The schema for each can be found below:
Table "public.articles"
Column | Type | Modifiers
--------+--------------------------+-------------------------------------------------------
author | integer | not null
title | text | not null
slug | text | not null
lead | text |
body | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('articles_id_seq'::regclass)
Indexes:
"articles_pkey" PRIMARY KEY, btree (id)
"articles_slug_key" UNIQUE CONSTRAINT, btree (slug)
Foreign-key constraints:
"articles_author_fkey" FOREIGN KEY (author) REFERENCES authors(id)
Table "public.authors"
Column | Type | Modifiers
--------+---------+------------------------------------------------------
name | text | not null
bio | text |
id | integer | not null default nextval('authors_id_seq'::regclass)
Indexes:
"authors_pkey" PRIMARY KEY, btree (id)
Referenced by:
TABLE "articles" CONSTRAINT "articles_author_fkey" FOREIGN KEY (author) REFERENCES authors(id)
Table "public.log"
Column | Type | Modifiers
--------+--------------------------+--------------------------------------------------
path | text |
ip | inet |
method | text |
status | text |
time | timestamp with time zone | default now()
id | integer | not null default nextval('log_id_seq'::regclass)
Indexes:
"log_pkey" PRIMARY KEY, btree (id)
In order to get a copy of the project up and running on your local machine for development and testing please see the instructions below.
- Python 3.6.2 installed. To download - go to Python.org.
- PostgreSQL.
- Virtual machine (if using) configured. See this Vagrantfile for use.
- Data downloaded. To download - go to news data. After downloading, unzip the file and place it in the vagrant directory - or whatever file is shared with your virtual machine.
- Data loaded. To do this -
cd
into your vagrant directory and use the commandpsql -d news -f newsdata.sql
. - Views in database "news" created. Download create_views.sql Two views are needed (see below with script to create).
Run script psql -d news -f create_views.sql
after downloading script above to create the necessary views. Views are detailed below.
- Create a view of all error requests in "log" table (anything other than '200 OK').
CREATE VIEW err_reqs AS
SELECT time::date AS date, count(*) AS tot_err
FROM log
WHERE status <> '200 OK'
GROUP BY date
ORDER BY date ASC;
- Create a view of all requests in "log" table (error and otherwise).
CREATE VIEW tot_reqs AS
SELECT time::date AS date, count(*) AS totals
FROM log
GROUP BY date
ORDER BY date ASC;
Download Logs-Analysis.
To test this code, ensure you are connected to the virtual machine, then from a console window (like Git Bash) run logs_analysis.py from the vagrant directory (or shared virtual machine directory). You will see the answers to the three questions printed out within the terminal window.
This project is licensed under the GNU General Public License. See the LICENSE for details.