Aggregation in SQL refers to the process of summarizing and combining multiple rows of data into a single result. It involves the use of aggregate functions to perform calculations on groups of rows, returning a single value for each group. Common aggregate functions include COUNT, SUM, AVG, MIN, and MAX.
-
COUNT(): Returns the number of rows in a result set or the number of non-null values in a column.
Example:
SELECT COUNT(*) FROM table_name;
-
SUM(): Calculates the sum of values in a numeric column.
Example:
SELECT SUM(sales_amount) FROM sales_data;
-
AVG(): Calculates the average of values in a numeric column.
Example:
SELECT AVG(salary) FROM employees;
-
MIN(): Returns the minimum value in a column.
Example:
SELECT MIN(order_date) FROM orders;
-
MAX(): Returns the maximum value in a column.
Example:
SELECT MAX(order_date) FROM orders;
Grouping data allows you to partition the result set into groups based on one or more columns and then perform aggregate calculations on each group.
GROUP BY Clause:
SELECT department, AVG(salary)
FROM employees
GROUP BY department;
You can further filter aggregated data using the HAVING clause. It works similarly to the WHERE clause but is used with aggregate functions.
HAVING Clause:
SELECT department, AVG(salary)
FROM employees
GROUP BY department
HAVING AVG(salary) > 50000;
-
Rollup: The ROLLUP operator generates subtotals and a grand total for each group in a result set.
Example:
SELECT department, city, SUM(sales_amount) FROM sales_data GROUP BY ROLLUP(department, city);
-
Cube: The CUBE operator produces a result set that represents all possible combinations of grouping sets.
Example:
SELECT department, city, SUM(sales_amount) FROM sales_data GROUP BY CUBE(department, city);
Window functions perform calculations across a set of rows related to the current row within a query result. They are used with an OVER clause to define the window of rows over which the calculation is performed.
Example:
SELECT department, salary,
AVG(salary) OVER (PARTITION BY department) AS avg_salary
FROM employees;
Aggregation operations in SQL are crucial for data analysis, reporting, and decision-making tasks, enabling users to derive meaningful insights from large datasets.
Top Earners
-
Questions
We define an employee's total earnings to be their monthly salary x months worked, and the maximum total earnings to be the maximum total earnings for any employee in the Employee table. Write a query to find the maximum total earnings for all employees as well as the total number of employees who have maximum total earnings. Then print these values as 2 space-separated integers.
Input Format
The Employee table containing employee data for a company is described as follows:
where employee_id is an employee's ID number, name is their name, months is the total number of months they've been working for the company, and salary is the their monthly salary.
-
Code
SELECT MONTHS * SALARY AS EARNINGS, COUNT(EMPLOYEE_ID) FROM EMPLOYEE GROUP BY EARNINGS ORDER BY EARNINGS DESC LIMIT 1
Weather Observation Station 2
-
Questions
Query the following two values from the STATION table:
- The sum of all values in LAT_N rounded to a scale of decimal places.
- The sum of all values in LONG_W rounded to a scale of decimal places.
Input Format The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
-
Code
SELECT ROUND(SUM(lat_n),2), ROUND(SUM(long_w),2) FROM STATION
Weather Observation Station 13
-
Code
SELECT ROUND(SUM(LAT_N), 4)FROM STATION WHERE LAT_N > 38.7880 AND LAT_N < 137.2345
Weather Observation Station 14
-
Code
SELECT ROUND(MAX(LAT_N), 4)FROM STATION WHERE LAT_N < 137.2345
Weather Observation Station 15
-
Code
SELECT ROUND(LONG_W, 4) FROM STATION WHERE LAT_N < 137.2345 ORDER BY LAT_N DESC LIMIT 1
Weather Observation Station 16
-
Code
SELECT ROUND(MIN(LAT_N), 4) FROM STATION WHERE LAT_N > 38.7780
Weather Observation Station 17
-
Questions
Query the Western Longitude (LONG_W)where the smallest Northern Latitude (LAT_N) in STATION that is greater than 38.7780. Round off your answer to 4 decimal places.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
-
Code
SELECT ROUND(LONG_W, 4) FROM STATION WHERE LAT_N > 38.7780 ORDER BY LAT_N LIMIT 1
Weather Observation Station 18
-
Questions
Consider P1(a,b) and P2(c,d) to be two points on a 2D plane.
- a happens to equal the minimum value in Northern Latitude (LAT_N in STATION).
- b happens to equal the minimum value in Western Longitude (LONG_W in STATION).
- chappens to equal the maximum value in Northern Latitude (LAT_N in STATION).
- d happens to equal the maximum value in Western Longitude (LONG_W in STATION). Query the Manhattan Distance between points and and round it to a scale of 4 decimal places.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
-
Code
SELECT ROUND(ABS(MAX(LAT_N) - MIN(LAT_N)) + ABS(MAX(LONG_W) - MIN(LONG_W)),4) FROM STATION
Weather Observation Station 19
-
Questions
Consider P1(a,b) and P2(c,d) to be two points on a 2D plane where (a,b) are the respective minimum and maximum values of Northern Latitude (LAT_N) and (c,d) are the respective minimum and maximum values of Western Longitude (LONG_W) in STATION.
Query the Euclidean Distance between points and and round it to a scale of 4 decimal places.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
-
Code
Weather Observation Station 20
-
Questions
A median is defined as a number separating the higher half of a data set from the lower half. Query the median of the Northern Latitudes (LAT_N) from STATION and round your answer to 4 decimal places.
Input Format
The STATION table is described as follows:
where LAT_N is the northern latitude and LONG_W is the western longitude.
-
Code
SELECT ROUND(S1.LAT_N, 4) FROM STATION AS S1 WHERE (SELECT ROUND(COUNT(S1.ID)/2) - 1 FROM STATION) = (SELECT COUNT(S2.ID) FROM STATION AS S2 WHERE S2.LAT_N > S1.LAT_N)
OR
SELECT ROUND(MEDIAN(LAT_N), 4) AS MEDIAN_LAT FROM STATION;
Average Population
-
Code
SELECT FLOOR(AVG(POPULATION)) FROM CITY
Japan Population
-
Code
SELECT SUM(POPULATION) FROM CITY WHERE COUNTRYCODE = 'JPN';
Population Density Difference
-
Code
SELECT MAX(POPULATION) - MIN(POPULATION) FROM CITY
The Blunder
-
Questions
Samantha was tasked with calculating the average monthly salaries for all employees in the EMPLOYEES table, but did not realize her keyboard's 0 key was broken until after completing the calculation. She wants your help finding the difference between her miscalculation (using salaries with any zeros removed), and the actual average salary.
Write a query calculating the amount of error (i.e.: actual - miscalculated average monthly salaries), and round it up to the next integer.
Input Format The EMPLOYEES table is described as follows:
-
Code
SELECT CEIL(AVG(SALARY) - AVG(CAST(REPLACE(SALARY, '0', '') AS DECIMAL))) FROM EMPLOYEES