As data gets more complex, not everything can be put into one table. So we use a process called normalization to split the data into multiple tables. This is mainly done to minimize redundancy and improve data integrity; plus otherwise you’d have a bunch of columns going all the way down.
Like an online store for example: you’d have a table for customers and their info, orders and their details, and inventory with quantities and other stuff. That’s one big ass table if you put all those together, and tedious to navigate through.
We’re still going to use the Pixar movies table, but we’re gonna add a box office table. The columns will be its movie_id (its place on the table), quality rating on a 10-point scale, and domestic and international sales.
So first task, I simply want to see each movie’s domestic and international sales. Remember, this data is in 2 different tables, I want to see it summarized into one without all the extra columns.
SELECT title, domestic_sales, international_sales
FROM movies
JOIN boxoffice
ON movies.id = boxoffice.movie_id;
As always, SELECT the columns I want to see. The data is FROM the ‘movies’ table and we JOIN the ‘boxoffice’ table. Strangely, without the last line, the ‘ON’ clause, it’ll show the sales numbers, but each movie is duplicated and set with all those numbers. Basically, out of the 14 movies and 14 sales on both tables, leaving that line out will create every possible combination of those movies and numbers. 14 x 14 that’s 196 entities on our summarized table, we only need to see the 14 once.
So with the ON clause we use the movies.id, in the context of the ‘boxoffice’ table, it uses the movies’ ID to set it with it’s sales, not the actual title. Then we match that with the ‘boxoffice.movie_id’ so it’ll show each movie as is. Remember we can’t see the ID numbers in this final because we didn’t SELECT to view it, but it still exists in the original tables we’re pulling from.
Next task, I want to only see the movies who did better internationally than they did domestically.
SELECT title, domestic_sales, international_sales
FROM movies
JOIN boxoffice
ON movies.id = boxoffice.movie_id
WHERE international_sales > domestic_sales;
The code is pretty much the same. We just add the ‘WHERE’ clause and set the international to greater than domestic.
And lastly, I want to list all the movies by rating in descending order.
SELECT title, domestic_sales, international_sales
FROM movies
JOIN boxoffice
ON movies.id = boxoffice.movie_id
ORDER BY rating DESC;
Top chunk is still the same, just add the ‘ORDER BY’ clause and set it. And this has to be at the bottom too, I tried doing it just between FROM and JOIN and it wouldn’t go.