- RELATIONAL DATABASE AND SQL
- PostgreSQL Commands
- Terminal
- Managing Databases
- Managing Tables
- Managing Indexes
- Querying Data from Tables
- Query All Data from a Table
- Query Data from Specified Columns
- Query Data Using a Filter - WHERE Operator
- Query Data - LIKE Operator
- Query Data - IN Operator
- Query Data - Constrain the Returned Rows - LIMIT Operator
- Query Data - JOIN
- Return the Number of Rows of a Table
- Sort Rows in Ascending or Descending Order - ORDER BY
- Filter Groups - HAVING Clause
- Set Operations
- Modifying Data
- ADVANCED
- EXERCISES
- The structure of a a particular database is known as its schema.
- Schemas include the definition of such things as the database's:
- Tables, including the number and data type of each column
- Indexes for efficient access of data.
- Constrains (rules, such as whether a field can e null or not)
-
Database tables look like a spreadsheet since they consist of columns and rows.
-
Tables are also known as relations.
-
A single table in a relational database holds data for a particular data entry.
-
Since only one type of data can be held in a single table, related data, we will have different tables storing different contents and tye are linked via what is known as a foreign key (FK).
-
Foreign key fields hold the value of its parent's primary key (PK).
-
The naming convention is typically snake_cased and always plural.
-
The columns of a table have a:
- Name
- Data type
- Optional contrains
-
The typical naming convention is usually snake_cased and singular.
-
PostgreSQL has many data types for columns, but common ones include:
- Integer
- Decimal
- Varchar (variable-length strings)
- Text (unlimited length strings)
- Date (does not include time)
- Timestamps (both date and time)
- Boolean
-
Common constrains for a column include:
PRIMARY KEY
: column, or group of columns, uniquely identify a row.REFERENCES
(Foreign Key): value in column must match the primary key in another table.NOT NULL
: column must have a value, it cannot be empty (null).UNIQUE
: data in this column must be unique among all rous in the table.
-
On terminal:
CREATE DATABASE music; CREATE TABLE bands ( id serial PRIMARY KEY, # serial is auto-incrementing integer name varchar NOT NULL, genre varchar );
-
Let's say we have the following data relationship:
Band ----< Musician
- A Band has many Musicians and a Musician belongs to a Band
-
Whenever you have a
one:many
relationship, the rows in the table for the many-side must include a column that references which row in the table on the on-side it belongs to. -
This column is known as a foreign key (FK)
-
The FK must be the same data type is the primary key in the parent table (usually an integer).
CREATE TABLE musicians ( id serial PRIMARY KEY, name varchar NOT NULL, quote text, band_id integer NOT NULL REFERENCES bands (id) );
-
Run PostgreSQL on Terminal:
pqsl
-
Connect to a specific database:
\c <database_name>
-
To quit the psql
\q
-
To List all databases in the PostgreSQL database server
\l
-
To list all tables inside the databa_base that you are currentt using.
\d
CREATE DATABASE <database_name>;
DROP DATABASE <database_name>;
ALTER TABLE <table_name> ADD COLUMN <column_name> <DATA_TYPE>;
ALTER TABLE <table_name> DROP COLUMN <column_name>;
ALTER TABLE <table_name> RENAME <column_name> TO <new_column_name>;
ALTER TABLE <table_name> RENAME TO <new_table_name>;
DROP INDEX <index_name>;
SELECT * FROM <table_name>;
SELECT <column_name_1>, <column_name_2>, ... FROM <table_name>;
SELECT * FROM <table_name> WHERE <condition>;
SELECT * FROM <table_name> WHERE <column_name> LIKE '%value%';
SELECT * FROM <table_name> WHERE <column_name> IN (value_1, value2, ...);
SELECT * FROM <table_name> LIMIT <limit> OFFSET <offset> ORDER BY <column_name>;
-
INNER JOIN, LEFT JOIN, FULL OUTER JOIN, CROSS JOIN and NATURAL JOIN
SELECT * FROM <table_name_1> INNER JOIN <table_name_2> ON <conditions>;
SELECT * FROM <table_name_1> LEFT JOIN <table_name_2> ON <conditions>;
SELECT * FROM <table_name_1> FULL OUTER JOIN <table_name_2> ON <conditions>;
SELECT * FROM <table_name_1> CROSS JOIN <table_name_2> ON <conditions>;
SELECT * FROM <table_name_1> NATURAL JOIN <table_name_2> ON <conditions>;
SELECT COUNT (*) FROM <table_name>;
SELECT * FROM <table_name> ORDER BY <column_name_1>, <column_name_2>, ...;
SELECT * FROM <table_name> GROUP BY <column_name> HAVING <condition>;
SELECT * FROM <table_name_1> UNION SELECT * FROM <table_name_2>;
SELECT * FROM <table_name_1> EXCEPT SELECT * FROM <table_name_2>;
SELECT * FROM <table_name_1> INTERSECT SELECT * FROM <tabble_name_2>;
INSERT INTO <table_name> (<column_name_1>, <column_name_2, ...) VALUES (<value_1>, <value_2>, ...);
INSERT INTO <table_name> (<column_name_1>, <column_name_2>, ...) VALUES (<value_1>, <value_2>, ...), (<value_1>, <value_2>, ...), (<value_1>, <value_2>, ...) ...;
UPDATE <table_name> SET <column_name_1> = <value_1>, ...;
UPDATE <table_name> SET <column_name_1> = <value_1>, ... WHERE <conditions>;
DELETE FROM <table_name>;
DELETE FROM <table_name> WHERE <condition>;
- SELECT: choose the fields for query
- FROM: pick table(s) fro data source
- WHERE: filter data based upon conditions
- GROUP BY: segment data into groups
- ORDER BY: sort results
- LIMIT: limit the number of records returned
- JOIN Types:
- INNER vs. OUTER
- LEFT vs. RIGHT
-
Rename fields in your queries:
SELECT <field> AS <alias>
SELECT milliseconds/1000. AS seconds
-
Reference tables as abbreviations:
FROM <table> <alias>
FROM track t JOIN genre g ON t.genreid = g.genreid;
SELECT g.name, AVG(t.milliseconds) / 1000. / 60. AS minutes FROM track t JOIN mediatype m ON t.mediatypeid = m.mediatypeid JOIN genre g ON t.genreid = g.genreid WHERE m.name LIKE '%audio%' GROUP BY g.name ORDER BY minutes DESC;
-
Reference fields by position:
SELECT <field1>, <field2> FROM <table> ORDER BY 1, 2
SELECT name, milliseconds FROM track ORDER BY 2 DESC
SELECT g.name, AVG(t.milliseconds) / 1000. / 60. AS minutes FROM track t JOIN mediatype m ON t.mediatypeid = m.mediatypeid JOIN genre g ON t.genreid = g.genreid WHERE m.name LIKE '%audio%' GROUP BY g.name ORDER BY 2 DESC;
SELECT EXTRACT ( <date_component> FROM <field> )
SELECT EXTRACT ( month FROM invoicedate ) AS month
FROM invoice;
- Date/Time components
- Day, Month, Year
- Week, Month
- Hour, Minute, Second
- DOW, DOY, Quarter, Timezone
SELECT <field1>, <agg function>(<field2>)
FROM <table>
GROUP BY <field1>
HAVING <agg function>(<field2>) <operator> <value>;
SELECT genreid, COUNT(trackid)
FROM track
GROUP BY genreid
HAVING COUNT(*) > 50
ORDER BY genreid ASC;
genreid | count |
---|---|
1 | 1297 |
2 | 130 |
3 | 374 |
4 | 332 |
SELECT genre.name,
AVG(track.milliseconds) / 1000. / 60. AS minutes
FROM track
JOIN mediatype ON track.mediatypeid = mediatype.mediatypeid
JOIN genre ON track.genreid = genre.genreid
WHERE mediatype.name LIKE '%audio%'
GROUP BY genre.name
HAVING AVG(track.milliseconds) / 1000. / 60. >= 3
ORDER BY minutes DESC;
-
Just like & in Excel, we can use
||
to concatenate results in SQLSELECT e.firstname || ' ' || e.lastname AS "Employee Name", boss.firstname || ' ' || boss.lastname AS "Boss" FROM employee e FULL JOIN employee boss ON e.reportsto = boss.employeeid;
SELECT SUM(milliseconds) / 1000. / 60. / 60. / 24. AS days
FROM track;
-- 15.958079
-
ILIKE
= case insensitiveSELECT g.name, AVG(t.milliseconds) / 1000. / 60. AS minutes FROM track t JOIN mediatype m ON t.mediatypeid = m.mediatypeid JOIN genre g ON t.genreid = g.genreid WHERE m.name LIKE '%audio%' GROUP BY g.name ORDER BY 2 DESC;
-
LIKE
= case sensitiveSELECT genre.name, AVG(track.milliseconds) / 1000. / 60. AS minutes FROM track JOIN mediatype ON track.mediatypeid = mediatype.mediatypeid JOIN genre ON track.genreid = genre.genreid WHERE mediatype.name LIKE '%audio%' GROUP BY genre.name HAVING AVG(track.milliseconds) / 1000. / 60. >= 3 ORDER BY minutes DESC;
-
Just like a
switch/case
in JavaScriptSELECT CASE WHEN < condition_1 > THEN < result_1 > WHEN < condition_2 > THEN < result_2 > ELSE <result_3 > END;
SELECT CASE WHEN AGE < 18 THEN 'child' WHEN AGE >= 60 THEN 'senior' ELSE 'adult' END AS age_market_segment;
-
Merges data from two queries by stacking results on top of each other
-
Must have same number of columns and corresponding data types
-
Duplicate results are removed by default
-
UNION ALL
will include duplicatesSELECT * FROM< table1 > UNION SELECT * FROM< table2 >;
-
Picks first non-null value
COALESCE ([field1], [field2], [field3])
COALESCE(online.firstname, catalog.firstname) AS firstname
- ATTENTION: Not all SQL supports window function
- Perform calculation across a set of table rows that are somehow related to the current row + all the previous
- Applications:
- Cumulative sales
- Percentile rank
- Group level results
SELECT SUM(< field1 >) OVER (ORDER BY< field2 >)
FROM< TABLE>;
-
Example, In this case the SUM is calculated dynamically
SELECT invoicedate, SUM(total) OVER (ORDER BY invoicedate) FROM invoice;
SELECT NTILE(< # OF groups >) OVER (ORDER BY< field >)
FROM< TABLE>;
SELECT name,
milliseconds,
NTILE(100) OVER (ORDER BY milliseconds DESC) AS percentile
FROM track;
SELECT AVG(< field1 >) OVER (PARTITION BY< field2 >)
FROM< TABLE>;
SELECT name,
genreid,
milliseconds,
AVG(milliseconds) OVER (PARTITION BY genreid)
FROM track;
WITH < subquery_name > AS (
< code_for_subquery >
)
< code_for_parent_query >;
WITH top_customers AS (
SELECT customerid,
SUM(total) AS sales
FROM invoice
GROUP BY customerid
)
SELECT COUNT(*)
FROM top_customers
WHERE sales > 40;
-
When we create a view, this "table" will be permanently saved on the database. This way, anyone that has access to this table, will have access to this view
-
ATTENTION: You can't have duplicate field names in a view
-
IMPORTANT: This table is populated with new data
CREATE VIEW <view_name > AS <query_code >; COMMIT;
CREATE VIEW top_customers AS SELECT customerid, SUM(total) AS sales FROM invoice GROUP BY customerid ORDER BY sales DESC;
-
This will permanently create a copy of my data, but it's deleted after you logged out
-
ATTENTION: You can't have duplicate field names in a view
-
IMPORTANT: This table is not populated with new data that is coming in, only if we updated the temporary table
CREATE TEMP TABLE < view_name > AS < query_code >;
CREATE TEMP TABLE total_sales AS SELECT customerid, SUM(total) AS sales FROM invoice GROUP BY customerid ORDER BY sales DESC;
-
How many days of content are there in the library?
SELECT SUM(milliseconds) / 1000. / 60. / 60. / 24. AS days FROM track; -- 15.958079
-
What are the longest songs (excluding video)?
SELECT track.name, track.milliseconds / 1000. / 60. AS minutes, mediatype.name FROM track JOIN mediatype ON track.mediatypeid = mediatype.mediatypeid WHERE mediatype.name ILIKE '%AUDIO%' ORDER BY track.milliseconds DESC LIMIT 100;
-
What is the average length of a song grouped by genre (convert time to minutes)?
SELECT genre.name, AVG(track.milliseconds) / 1000. / 60. AS minutes FROM track JOIN mediatype ON track.mediatypeid = mediatype.mediatypeid JOIN genre ON track.genreid = genre.genreid WHERE mediatype.name LIKE '%audio%' GROUP BY genre.name ORDER BY minutes DESC;
-
Which customers have spent more than $40 (Use Group By and Having for the customer and invoice tables)?
SELECT customer.firstname, customer.lastname, invoice.customerid, SUM(invoice.total) FROM customer JOIN invoice ON invoice.customerid = customer.customerid GROUP BY customer.firstname, customer.lastname, invoice.customerid HAVING SUM(invoice.total) > 40;
-
What are the total sales by month (Use Extract and Group By and the invoice table)?
SELECT EXTRACT(month FROM invoicedate) AS month, SUM(invoice.total) AS total FROM invoice GROUP BY month ORDER BY total DESC; SELECT EXTRACT(month FROM invoicedate) AS month, SUM(invoice.total) FROM invoice GROUP BY month ORDER BY SUM(invoice.total) DESC;
-
Create a roster of employees with their bosses (Join the employee table to itself by using table aliases)
SELECT t1.firstname, t1.lastname, t2.firstname, t2.lastname FROM employee t1 LEFT JOIN employee t2 ON t1.reportsto = t2.employeeid;
- Different solution
SELECT e.firstname || ' ' || e.lastname AS "Employee Name", boss.firstname || ' ' || boss.lastname AS "Boss" FROM employee e FULL JOIN employee boss ON e.reportsto = boss.employeeid;
-
Using the iowa liquor products table, create an alcohol type label for whisky, vodka, tequila, rum, brandy, schnapps and any other liquor types (hint: use
CASE STATEMENT
andLIKE
)SELECT DISTINCT category_name, CASE WHEN category_name ILIKE '%schnapps%' THEN 'schnapps' WHEN category_name ILIKE '%wisk%' OR category_name ILIKE '%scotch%' THEN 'whiskey' WHEN category_name ILIKE '%vodka%' THEN 'vodka' WHEN category_name ILIKE '%rum%' THEN 'rum' WHEN category_name ILIKE '%brand%' THEN 'brandy' WHEN category_name ILIKE '%gin%' THEN 'gin' ELSE 'Other' END AS liquor_type FROM iowa_products ORDER BY liquor_type;
-
Adding a counter
SELECT category_name, CASE WHEN category_name ILIKE '%schnapps%' THEN 'schnapps' WHEN category_name ILIKE '%wisk%' OR category_name ILIKE '%scotch%' THEN 'whiskey' WHEN category_name ILIKE '%vodka%' THEN 'vodka' WHEN category_name ILIKE '%rum%' THEN 'rum' WHEN category_name ILIKE '%brand%' THEN 'brandy' WHEN category_name ILIKE '%gin%' THEN 'gin' ELSE 'Other' END AS liquor_type, COUNT(*) FROM iowa_products GROUP BY liquor_type, category_name ORDER BY liquor_type, category_name;
-
Using the catalog and online tables, create a customer list that combines the names from the catalog and online tables using UNION without creating duplicates.
SELECT customerid,
firstname AS "First Name",
lastname AS "Last Name"
FROM catalog
UNION
SELECT customerid,
firstname,
lastname
FROM online
ORDER BY "First Name",
"Last Name";
-
FULL JOIN
the catalog and online tables and inspect the results. Try adding the catalog sales and online sales totals together. Why do you get errors? -
Wrong answer:
SELECT *, c.catalogpurchases + o.onlinepurchases FROM catalog c FULL JOIN online o ON c.customerid = o.customerid;
-
Right answer
SELECT *, COALESCE(c.catalogpurchases + o.onlinepurchases) AS firstname, COALESCE(c.lastname,o.lastname) AS lastname, COALESCE(c.catalogpurchases,0) +COALESCE(o.onlinepurchases,0) AS total_sales FROM catalog c FULL JOIN online o ON c.customerid = o.customerid;
- How many iowa liquor vendors have more than $1 million in 2014 sales (hint: use subquery to group sales by vendor)?
WITH vendor_sales AS (
SELECT vendor,
vendor_no,
SUM(total)
FROM iowa_sales
WHERE EXTRACT(year FROM DATE) = 2014
GROUP BY vendor,
vendor_no
HAVING SUM(total) > 1000000
)
SELECT COUNT(*) FROM vendor_sales;
-
Group sales by month with a subquery and then calculate cumulative sales by month for 2014 (using iowa sales table)
WITH sale_by_month AS ( SELECT EXTRACT(YEAR FROM iowa_sales.date) AS YEAR, EXTRACT(MONTH FROM iowa_sales.date) AS MONTH, SUM(total) AS new_total FROM iowa_sales GROUP BY YEAR, MONTH ) SELECT SUM(new_total) OVER (ORDER BY month) FROM sale_by_month WHERE year = 2014;
-
Alternative
WITH monthly_sales AS ( SELECT EXTRACT(month FROM DATE) AS month, SUM(total) AS sales FROM iowa_sales WHERE EXTRACT(year FROM DATE) = 2014 GROUP BY month ) SELECT month, sales / 1000000. AS month_sales, TO_CHAR(sales,'999,999,999'), SUM(sales) OVER (ORDER BY month) / 1000000. AS cum_sales FROM monthly_sales ORDER BY month;
-
Alternative with monthly sales
WITH monthly_sales AS ( SELECT EXTRACT(month FROM DATE) AS month, EXTRACT(year FROM DATE) AS year, SUM(total) AS sales FROM iowa_sales GROUP BY month, year ) SELECT month, year, sales / 1000000. AS month_sales, SUM(sales) OVER (PARTITION BY year ORDER BY month) / 1000000. AS cum_sales FROM monthly_sales ORDER BY year, month;
-
Create a View that adds liquor type to the iowa product data. Don't forget to commit your changes.
IF EXISTS DROP TABLE new_iowa_products; CREATE VIEW new_iowa_products AS ( SELECT CASE WHEN i.category_name ILIKE '%whisky%' THEN 'whisky' WHEN i.category_name ILIKE '%vodka%' THEN 'vodka' WHEN i.category_name ILIKE '%tequila%' THEN 'tequila' WHEN i.category_name ILIKE '%rum%' THEN 'rum' WHEN i.category_name ILIKE '%brandy%' THEN 'brandy' WHEN i.category_name ILIKE '%schnapps%' THEN 'schnapps' ELSE 'other' END AS liquor_type, * FROM iowa_products AS i ); COMMIT; SELECT * FROM new_iowa_products;