So far we’ve only been doing basic queries and selections. Now we’re getting into complex logic to find more detailed insights with our data. We do this by writing expressions, which is basically adding math to our queries.
We’re going back to the ‘movies’ and ‘box office’ tables. The movie’s columns are ID, title, director, year, and length_minutes. Box office’s columns are movie_id, rating, domestic_sales, and international_sales.
For this first task, we want to view all the movies and their combined sales in millions; that last part is important. Since the ‘box office’ table only has domestic and international, we’ll have to combine them in this query.
SELECT title, (domestic_sales + international_sales) / 1000000 AS gross_sales_millions
FROM movies
JOIN boxoffice
ON movies.id = boxoffice.movie_id;
Of course we want to select the movie title. Then we add the domestic and international’s sales in a parenthesized equation. Again, we want to see these sales in millions, meaning we don’t need to see the full-figure number, we just want a few digits that represent millions. So for that, we’re gonna divide that sales sum by 1 million. This way if a movie’s box office is 80,000,000, it will only show as 80 in this final table. Finally, we set the column for these numbers AS ‘gross_sales_millions’. To recap, we just added the numbers of 2 columns and created a new column out of their sums.
Next task, we want to view all the movies and their ratings in percentage form.
SELECT title, rating * 10 AS rating_percent
FROM movies
JOIN boxoffice
ON movies.id = boxoffice.movie_id;
Now in the ‘ratings’ column, the numbers are set on a 10-point scale in decimal form. To change these to percentage form, we simply multiply the ratings by 10, so if a rating is 8.2, that turns to 82%. And again we use the AS keyword to create a new column for our final result, which is ‘ratings_percent’ in this case.
Next, I want to view all the movies that were released on even-number years.
SELECT title, year
FROM movies
WHERE year % 2 = 0;
The percentage sign in SQL is a divisor. The 2 that follows tells it to only select year numbers that are divisible by 2. The 0 is for the remainder, or no remainder in this case, so it only looks for numbers divisible by 2. If we left the 2 by itself, it would still show odd numbers.
Aggregates
With aggregates, we’ll be learning how to use certain functions to find metrics within data.
We’re going back to the ’employee’ table. The columns are role, name, building, and years_employed.
I want to see the longest an employee has been at the company.
SELECT MAX(years_employed) as Max_years_employed
FROM employees;
Even though we’re selecting the ‘years_employed’ we first have to wrap it in a MAX function so it knows to look for the maximum number. And of course we set that result to a new column ‘max_years_employed’.
Next, for each role, i want to see the average number of years that employees have been in that role.
SELECT role, AVG(years_employed) as Average_years_employed
FROM employees
GROUP BY role;
Just how we did with MAX, we wrap our selected column into an AVG function, and set that result to a new column. Even though we selected the ‘role’ column, the GROUP BY ‘role’ is still necessary to view all the roles.
Next I want to find the total number of employee years for each building.
SELECT building, SUM(years_employed) as Total_years_employed
FROM employees
GROUP BY building;
Same process. Select the columns, wrap the column that has the metric you’re looking for into a function, and set that result to a new column.
The table has artists as one of its roles. So I want to view the total number of artists in the table.
SELECT role, COUNT(*) as Number_of_artists
FROM employees
WHERE role = "Artist";
Past selecting the ‘role’ column, we’re using the COUNT clause to simply count how many times the role appears. The asterisk after the COUNT is what tells it to look for all the times it appears. The WHERE on the last line is how we specify which role we want to see, which is artist in this case.
If I want to see the number of employees for all the roles, I’d leave the top line the same up to the new column name, and change the WHERE clause to simply GROUP BY the role.
SELECT role, COUNT(*)
FROM employees
GROUP BY role;
To close this out on a similar task, I want to see the total number of years employed by all engineers.
SELECT role, SUM(years_employed) as total_engineer_years_employed
FROM employees
WHERE role = "Engineer"
This is basically the same code as we did for total years in each building. Besides the resulting column’s name the only difference is the WHERE clause at the end which shows I want to specifically look at the engineers