"Every hour, every day, digital databases quietly store, cross-reference, and return information on every aspect of our lives."
https://www.computerhistory.org/revolution/memory-storage/8/265
What is a database? [1]
A database's primary purpose is to provide a mechanism for storing and retrieving information. There are many different types of databases but they all provides these two capabilities.
The easiest way to understand a database is as a collection of related files. Imagine a file (either paper or digital) of sales orders in a shop. Then there's another file of products, containing stock records. To fulfil an order, you'd need to look up the product in the order file and then look up and adjust the stock levels for that particular product in the product file. A database and the software that controls the database, called a database management system (DBMS), helps with this kind of task.
Most databases today are relational databases, named such because they deal with tables of data related by a common field. For example, Table 1 below shows the product table, and Table 2 shows the orders table. As you can see, the relation between the two tables is based on the common field product_code. Any two tables can relate to each other simply by having a field in common.
product_code |
description |
price |
|---|---|---|
| A416 | nails, box | €0.14 |
| C923 | drawing pins, box | €0.08 |
order_code |
order_item |
product_code |
quantity |
|---|---|---|---|
| 3804 | 1 | A416 | 10 |
| 3804 | 2 | C923 | 15 |
Important
Any two tables can relate to each other simply by creating a field they have in common. Table 1 and Table 2 share the product_code column.
Let's take a closer look at the previous two tables to see how they are organized:
- Each table consists of many rows and columns.
- Each new row contains data about one single entity (such as one product or one order line). This is also called a record. For example, the first row in Table 1 is a record; it describes the A416 product, which is a box of nails that costs fourteen cents.
Note
The terms row and record are interchangeable.
- Each column (also called an attribute) contains one piece of data that relates to the record. Examples of attributes are the quantity of an item sold (
quantity) or the price of a product (price).
Note
The terms column and attribute are interchangeable.
Given this kind of structure, the database gives you a way to manipulate this data: SQL. SQL (structured query language) is a powerful way to search for records or make changes.
Relational databases go hand-in-hand with the development of SQL. The simplicity of SQL - where even a novice can learn to perform basic queries in a short period of time - is a large part of the reason for the popularity of the relational model.
SQL is relatively simple to start learning because it's a declarative language. This means the programmer does not directly describe computations (like in Javascript), but instead describes the desired result. It is up to the database to interpret the statements into computations, to produce such a result.
Almost all DBMSs use SQL, although many have added their own enhancements to it. This means that when you learn using a specific DBMS like MySQL, almost all of it is not specific to MySQL and can be used with other relational databases as well, such as PostgreSQL, SQLite, MariaDB, and SQL Server.
Tip
SQL is pronounced "sequel"
There are many other types of databases besides relational databases, such as graph databases, non-relational databases (also called NoSQL databases), and key-value databases. In this module, we'll focus on relational databases (SQL), using MySQL for the exercises.
SQL (Structured Query Language) is a standard programming language designed for managing and manipulating relational databases. It provides a powerful set of commands for querying, updating, and managing data stored in a relational database management system (RDBMS). This comprehensive guide aims to provide a detailed understanding of SQL, covering its syntax, data manipulation capabilities, data definition commands, advanced querying techniques, and more.
Important
This sections assumes you have the world database setup in your computer. If it isn't, complete the setup guide before continuing.
In this section we expect you to follow along the commands. We'll be using an example database - world[3] - that contains data on cities, countries, and languages.
Querying Data with SQL[2]
In this section, we will be learning different SQL commands to query tables in a database.
One of the core purposes of the SQL language is to retrieve information stored in a database. This is commonly referred to as querying. SQL queries allow us to communicate with the database by asking questions in the form of a structured formal language and obtaining all matching data as a result.
Let’s get started! We should get familiar with the city table.
In DBeaver, open a SQL console and then type the following:
SELECT * FROM city;SELECT is used every time you want to query data from a database and * means all columns.
Suppose we are only interested in two of the columns. We can select individual columns by their names (separated by a comma):
SELECT column1, column2
FROM table_name;To make it easier to read, we moved FROM to another line.
Line breaks don’t mean anything specific in SQL. We could write this entire query in one line, and it would run just fine.
Let’s only select the name and district columns of the table:
SELECT name, district FROM city;
-- +----------------+---------------+
-- | name | district |
-- +----------------+---------------+
-- | Kabul | Kabol |
-- | Qandahar | Qandahar |
-- | Herat | Herat |
-- (...)
-- 4079 rows in setNow we want to include a third column.
Edit your query so that it returns the name, district, and population columns of the table.
-- Edit the query
SELECT ...View solution
SELECT name, district, population FROM city;
-- +----------------+---------------+------------+
-- | name | district | population |
-- +----------------+---------------+------------+
-- | Kabul | Kabol | 1780000 |
-- | Qandahar | Qandahar | 237500 |
-- | Herat | Herat | 186800 |
-- (...)
-- 4079 rows in setKnowing how SELECT works, suppose we have the code below:
SELECT name AS 'City Name' FROM city;Can you guess what AS does?
AS is a keyword in SQL that allows you to rename a column or table using an alias. The new name can be anything you want. Here we renamed the name column as City Name.
Important
When using AS, the columns are not being renamed in the table. The aliases only appear in the result.
To experiment with what the AS keyword does, select the population column and rename it with an alias of your choosing:
-- Return population column with an alias
SELECT ___ AS '______' FROM city;View solution
--- Example
SELECT population as 'Number of Habitants' FROM city;Now, edit the query so that instead of selecting and renaming population, select the countrycode column and rename it to ISO Country Code
-- Return countrycode column with an alias
SELECT ___ AS '______' FROM city;View solution
SELECT countrycode as 'ISO Country Code' FROM city;
-- +------------------+
-- | ISO Country Code |
-- +------------------+
-- | ABW |
-- | AFG |
-- (...)
-- 4079 rows in set
Tip
You can alias multiple columns in a single query:
SELECT name AS n, countrycode AS c FROM city;When we are examining data in a table, it can be helpful to know what distinct values exist in a particular column.
DISTINCT is used to return unique values in the output. It filters out all duplicate values in the specified column(s).
For instance,
SELECT name, countrycode from city;will produce:
name |
countrycode |
|---|---|
| Kabul | AFG |
| Qandahar | AFG |
| Herat | AFG |
| Mazar-e-Sharif | AFG |
| Amsterdam | NLD |
| Rotterdam | NLD |
| Haag | NLD |
| Utrecht | NLD |
| (...) | (...) |
You'll notice there are multiple cities from countries AFG, and NLD.
It is easier to see the different countrycodes in the city table after the data has been filtered than to scan every row in the table.
You can do that by adding a DISTINCT before the column name. Let's try it out:
SELECT DISTINCT countrycode FROM city;the result would now be:
countrycode |
|---|
| ABW |
| AFG |
| AGO |
| AIA |
| ALB |
| AND |
| (...) |
You should notice no countrycode is listed more than once.
Now, change the code so we return the unique values of the district column instead.
-- Return the unique values of the `district` column
SELECT ...View solution
SELECT DISTINCT district from city;
-- +---------------+
-- | district |
-- +---------------+
-- | Kabol |
-- | Qandahar |
-- | Herat |
-- (...)
-- 4079 rows in setIn your very first query:
SELECT * FROM citythe FROM command is used to specify which table(s) to select data from.
The world database also has a country table. We can retrieve data from that table by changing the FROM clause:
SELECT * from country;
-- +-------+-------------+---------------+---------------------------+-------+---------+-------+
-- | Code | Name | Continent | Region | (...) | Capital | Code2 |
-- +-------+-------------+---------------+---------------------------+-------+---------+-------|
-- | ABW | Aruba | North America | Caribbean | (...) | 129 | AW |
-- | AFG | Afghanistan | Asia | Southern and Central Asia | (...) | 1 | AF |
-- | AGO | Angola | Africa | Central Africa | (...) | 56 | AO |
-- | (...) | (...) | (...) | (...) | (...) | (...) | (...) |
-- +------ +-------------+---------------+---------------------------+-------+---------+-------|Tip
You can see which tables exist in the world database with a command:
SHOW TABLES;In DBeaver, open Databases > 'world' > Tables on the left sidebar.
Important
SELECT and FROM are the most basic SQL query operators. They allows to specify which tables (FROM) and columns (SELECT) we want to retrieve from the database.
We can restrict our query results using the WHERE clause in order to obtain only the information we want.
Following this format, the statement below filters the result set to only include the most populated cities (over 5 million habitants):
SELECT * FROM city WHERE population > 5000000;
-- +------+----------------+-------------+----------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+----------------+-------------+----------------+------------+
-- | 206 | São Paulo | BRA | São Paulo | 9968485 |
-- | 207 | Rio de Janeiro | BRA | Rio de Janeiro | 5598953 |
-- (...)
-- 24 rows in setHow does it work?
-
The
WHEREclause filters the result set to only include rows where the following condition is true. -
population > 5000000is the condition. Here, only rows with a value greater than 5000000 in thepopulationcolumn will be returned.
The > is an operator. Operators create a condition that can be evaluated as either true or false.
Comparison operators used with the WHERE clause are:
=equal to<>or!=not equal to>greater than<less than>=greater than or equal to<=less than or equal to
There are also some special operators that we will learn more about in the upcoming exercises.
Now, suppose we want to known which are the least populated cities (population under 1000):
-- Retrieve the cities with less than 1000 people. Select all columns
SELECT ...View solution
SELECT * FROM city WHERE population < 1000;
-- +------+---------------------+-------------+-------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+---------------------+-------------+-------------+------------+
-- | 61 | South Hill | AIA | – | 961 |
-- | 62 | The Valley | AIA | – | 595 |
-- | 1791 | Flying Fish Cove | CXR | – | 700 |
-- (...)
-- 11 rows in set[!TIP] You can compare the values of two columns in a
WHEREclause:SELECT x, y FROM coordinates WHERE x > y;
LIKE can be a useful operator when you want to compare similar values.
There are 2 cities in China with similar names: Anshun and Anshan.
How could we select all cities that start with ‘Ansh’ and end with ‘n’ and have exactly one character in between?
SELECT * FROM city WHERE name LIKE 'Ansh_n';
-- +------+--------+-------------+----------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+--------+-------------+----------+------------+
-- | 1918 | Anshan | CHN | Liaoning | 1200000 |
-- | 2106 | Anshun | CHN | Guizhou | 174142 |
-- +------+--------+-------------+----------+------------+-
LIKEis a special operator used with theWHEREclause to search for a specific pattern in a column. -
name LIKE 'Se_en'is a condition evaluating the name column for a specific pattern. -
Ansh_nrepresents a pattern with a wildcard character.
The _ means you can substitute any individual character here without breaking the pattern. The names Anshan and Anshun both match this pattern.
The percentage sign % is another wildcard character that can be used with LIKE.
This statement below filters the result set to only include cities with names that begin with the word ‘San’:
SELECT *
FROM city
WHERE name LIKE 'San%';% is a wildcard character that matches zero or more missing characters in the pattern. For example:
A%matches all cities with names that begin with letter ‘A’%amatches all cities that end with ‘a’San %matches all cities with names that begin with the wordSan(mind the whitespace!)- ✅
San Felipe,San Francisco,San Marino... - ❌
Santa Monica
- ✅
%landmatches all cities with names that end withlandSunderland,Auckland,Portland...
We can also use % both before and after a pattern:
SELECT * FROM city WHERE name LIKE '%los%';
-- +------+----------------------------+-------------+--------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+----------------------------+-------------+--------------+------------+
-- | 117 | San Nicolás de los Arroyos | ARG | Buenos Aires | 119302 |
-- | 316 | São Carlos | BRA | São Paulo | 187122 |
-- | 357 | Ferraz de Vasconcelos | BRA | São Paulo | 139283 |
-- | 568 | Los Angeles | CHL | Bíobío | 158215 |
-- (...)Here, any city that contains the word ‘los’ in its name will be returned in the result.
LIKE is not case sensitive. ‘Los Angeles’ and ‘São Carlos’ both appear in the result of the query above.
How many cities contain the word saint?
-- Return the cities with names that contain the word 'saint'
SELECT ...View solution
SELECT * FROM city WHERE name LIKE '%saint%';
-- Answer: 13 citiesOne more: how many cities begin with the word New? Tip: you may need a space in there.
-- Return the cities with names that start with the word 'New'
SELECT ...View solution
SELECT * FROM city WHERE name LIKE 'New %';
-- +------+-------------+-------------+---------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+-------------+-------------+---------------+------------+
-- | 1106 | New Bombay | IND | Maharashtra | 307297 |
-- | 1109 | New Delhi | IND | Delhi | 301297 |
-- | 3793 | New York | USA | New York | 8008278 |
-- | 3823 | New Orleans | USA | Louisiana | 484674 |
-- | 3971 | New Haven | USA | Connecticut | 123626 |
-- | 4044 | New Bedford | USA | Massachusetts | 94780 |
-- +------+-------------+-------------+---------------+------------+
-- 6 rows in set (0,01 sec)Let's go over to the country table for a while.
This table has more columns, and some records contain a few missing values. More often than not, the data you encounter will have missing values.
- Unknown values are indicated by
NULL. - It is not possible to test for
NULLvalues with comparison operators, such as=and!=. - Instead, we will have to use these operators:
IS NULLIS NOT NULL
For example, we can use IS NOT NULL to retrieve countries for which an independence year is registered (indepyear column):
SELECT name, indepyear FROM country WHERE indepyear IS NOT NULL
-- +----------------------+-----------+
-- | name | indepyear |
-- +----------------------+-----------+
-- | Afghanistan | 1919 |
-- | Angola | 1975 |
-- | Albania | 1912 |
-- (...)Now let’s do the opposite. Write a query to find all countries for which there is no independence day registered:
-- Retrieve countries without an independence. Select only the 'name' column
SELECT ...View solution
SELECT name FROM country WHERE indepyear IS NULL;
-- +----------+
-- | name |
-- +----------+
-- | Aruba |
-- | Anguilla |
-- (...)Sometimes we want to combine multiple conditions in a WHERE clause to make the result set more specific and useful.
One way of doing this is to use the AND operator. Here, we use the AND operator to only return European countries with over 1 million of population:
SELECT *
FROM country
WHERE continent = 'Europe'
AND population > 1000000;
-- other columns omitted
-- +---------+------------+
-- | name | population |
-- +---------+------------+
-- | Albania | 3401200 |
-- | Austria | 8091800 |
-- | Belgium | 10239000 |
-- (...)
-- 35 rows in setcontinent = 'Europe'is the 1st conditionpopulation > 1000000is the 2nd conditionANDcombines the 2 conditions
With AND, both conditions must be true for the row to be included in the result.
Now, let's go back to the city table and practice! Retrieve every city from the United States (hint: also known as USA), that have few inhabitants (less than 90 thousand):
-- Retrieve cities in the USA with less than 90 thousand of population
SELECT ...View solution
SELECT * FROM city WHERE countrycode = 'USA' AND population < 90000;
-- +------+------------+-------------+----------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+------------+-------------+----------------+------------+
-- | 4062 | Kenosha | USA | Wisconsin | 89447 |
-- | 4063 | Elgin | USA | Illinois | 89408 |
-- (...)
-- 5 rows in setFinally, how many German cities in district of 'Nordrhein-Westfalen' have over 250 thousand inhabitants?
-- Retrieve cities in the district of Nordrhein-Westfalen with more than 250000 of population
SELECT ...View solution
SELECT * from city where district = 'Nordrhein-Westfalen' and population > 250000;
-- +------+-------------+-------------+---------------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+-------------+-------------+---------------------+------------+
-- | 3071 | Köln | DEU | Nordrhein-Westfalen | 962507 |
-- | 3073 | Essen | DEU | Nordrhein-Westfalen | 599515 |
-- (...)
-- 12 rows in setSimilar to AND, the OR operator can also be used to combine multiple conditions in WHERE, but there is a fundamental difference:
ANDoperator displays a row if all the conditions are true.ORoperator displays a row if any condition is true.
Suppose you're considering a vacation either in Portugal or Spain:
SELECT *
FROM city
WHERE countrycode = 'PRT' OR countrycode = 'ESP';
-- +-----+---------+-------------+----------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +-----+---------+-------------+----------+------------+
-- | 653 | Madrid | ESP | Madrid | 2879052 |
-- (...)
-- | 2914 | Lisboa | PRT | Lisboa | 563210 |
-- (...)
-- 64 rows in setcontrycode = 'PRT'is the 1st conditioncontrycode = 'ESP'is the 2nd conditionORcombines the two conditions
With OR, if any of the conditions are true, then the row is added to the result.
It's your turn! Write a query that returns all cities in either Finland (FIN) or Norway (NOR)
-- Retrieve cities in either Finland or Norway
SELECT ...View solution
SELECT * FROM city WHERE countrycode = 'FIN' OR countrycode = 'NOR';
-- +------+------------------------+-------------+----------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+------------------------+-------------+----------+------------+
-- | 3236 | Helsinki [Helsingfors] | FIN | Newmaa | 555474 |
-- (...)
-- | 2807 | Oslo | NOR | Oslo | 508726 |
-- (...)
-- 12 rows in setIt is often useful to list the data in our result set in a particular order.
We can sort the results using ORDER BY, either alphabetically or numerically. Sorting the results often makes the data more useful and easier to analyze.
For example, if we want to sort everything by the cities' name from A through Z:
SELECT * FROM city ORDER BY name;ORDER BYis a clause that indicates you want to sort the result set by a particular column.nameis the specified column.
Sometimes we want to sort things in a decreasing order. For example, if we want to select cities with a large population, sorted from highest to lowest:
SELECT *
FROM city
WHERE population > 9000000
ORDER BY population DESC;
-- +------+-----------------+-------------+--------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+-----------------+-------------+--------------+------------+
-- | 1024 | Mumbai (Bombay) | IND | Maharashtra | 10500000 |
-- | 2331 | Seoul | KOR | Seoul | 9981619 |
-- (...)
-- 6 rows in set-
DESCis a keyword used inORDER BYto sort the results indescendingorder (high to low or Z-A). -
ASCis a keyword used inORDER BYto sort the results in ascending order (low to high or A-Z).
The column that we ORDER BY doesn’t even have to be one of the columns that we’re displaying.
Note: ORDER BY always goes after WHERE (if WHERE is present).
Your turn! Write a query that retrieves the name and population columns of all cities, ordered by their name in descending alphabetical order.
-- Retrieve cities in descending alphabetical order by name. Retrieve `name` and `population`
SELECT ...View solution
SELECT name, population FROM city ORDER BY name DESC;
-- +---------+------------+
-- | name | population |
-- +---------+------------+
-- | Zytomyr | 297000 |
-- | Zwolle | 105819 |
-- | Zwickau | 104146 |
-- (...)
-- 4079 rows in setYou may have noticed that the city table contains thousands of records. Sometimes, it's useful to cap the numbers of rows in query results, to reduce noise.
For example, imagine that we just want to see a few examples of records:
SELECT *
FROM city
LIMIT 10;-
LIMITis a clause that lets you specify the maximum number of rows the result set will have. This saves space on our screen and makes our queries run faster. -
Here, we specify that the result set can’t have more than 10 rows.
-
LIMITalways goes at the very end of the query
Your turn! Combining your knowledge of LIMIT and ORDER BY, write a query that returns the top 3 highest populated cities.
-- Retrieve the 3 most populated cities. Select all columns
SELECT ...View solution
SELECT * FROM city ORDER by population DESC LIMIT 3;
-- +------+-----------------+-------------+-------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +------+-----------------+-------------+-------------+------------+
-- | 1024 | Mumbai (Bombay) | IND | Maharashtra | 10500000 |
-- | 2331 | Seoul | KOR | Seoul | 9981619 |
-- | 206 | São Paulo | BRA | São Paulo | 9968485 |
-- +------+-----------------+-------------+-------------+------------+
-- 3 rows in set[!TIP] If the number set in the
LIMITclause surpasses the number of rows available to select, then it will just return the rows that are present.
As introduced in the "What is a database?" section, in order to efficiently store data, we often spread related information across multiple tables.
For example, in the city table, let's assume we want more information about the country. We could add the country information to the city table:
city_id |
city_name |
city_district |
city_population |
country_code |
country_name |
country_continent |
country_surfacearea |
|---|---|---|---|---|---|---|---|
| 1 | Kabul | Kabol | 1780000 | AFG | Afghanistan | Asia | 652090.00 |
| 2 | Qandahar | Qandahar | 237500 | AFG | Afghanistan | Asia | 652090.00 |
| 3 | Herat | Herat | 186800 | AFG | Afghanistan | Asia | 652090.00 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 56 | Luanda | Luanda | 2022000 | AGO | Angola | Africa | 1246700.00 |
| 57 | Huambo | Huambo | 163100 | AGO | Angola | Africa | 1246700.00 |
| 58 | Lobito | Benguela | 130000 | AGO | Angola | Africa | 1246700.00 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 61 | South Hill | – | 961 | AIA | Anguilla | North America | 96.00 |
| 62 | The Valley | – | 595 | AIA | Anguilla | North America | 96.00 |
| 34 | Tirana | Tirana | 270000 | ALB | Albania | Europe | 28748.00 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 64 | Dubai | Dubai | 669181 | ARE | United Arab Emirates | Asia | 83600.00 |
| 65 | Abu Dhabi | Abu Dhabi | 398695 | ARE | United Arab Emirates | Asia | 83600.00 |
| 66 | Sharja | Sharja | 320095 | ARE | United Arab Emirates | Asia | 83600.00 |
| ... | ... | ... | ... | ... | ... | ... | ... |
| 69 | Buenos Aires | Distrito Federal | 2982146 | ARG | Argentina | South America | 2780400.00 |
| 70 | La Matanza | Buenos Aires | 1266461 | ARG | Argentina | South America | 2780400.00 |
| 71 | Córdoba | Córdoba | 1157507 | ARG | Argentina | South America | 2780400.00 |
| ... | ... | ... | ... | ... | ... | ... | ... |
However, a lot of this information would be repeated. Notice how for cities in the same country all the country_(...) information is the same. This will make our table big and unmanageable.
Instead, the data is split into 2 tables: city and country, the ones you've using so far.
citycontains information about the cities:id,name,countrycode,district, and population
countrycontains information about each country:code,name,continent,region,surfacearea, and more...
If we just look at the city table, we can't know more details about the respective country. However, if we refer to the country table, we can get a complete picture:
SELECT * FROM city;
-- +----+----------------+-------------+---------------+------------+
-- | ID | Name | CountryCode | District | Population |
-- +----+----------------+-------------+---------------+------------+
-- | 1 | Kabul | AFG | Kabol | 1780000 |
-- | 2 | Qandahar | AFG | Qandahar | 237500 |
-- (...)
-- | 6 | Rotterdam | NLD | Zuid-Holland | 593321 |
-- | 7 | Haag | NLD | Zuid-Holland | 440900 |
-- (...)
SELECT code, name, continent, region from country;
-- +------+-------------+---------------+---------------------------+
-- | code | name | continent | region |
-- +------+-------------+---------------+---------------------------+
-- | ABW | Aruba | North America | Caribbean |
-- | AFG | Afghanistan | Asia | Southern and Central Asia |
-- (...)
-- | NLD | Netherlands | Europe | Western Europe |
-- | NOR | Norway | Europe | Nordic Countries |
-- (...)To find out more about the country where Rotterdam is located, we can look in the country table for the record with code NLD. We can then see that Rotterdam is in the Netherlands, located in Europe.
Doing this kind of matching is called joining two tables.
Important
Notice how city.countrycode matches country.code values
Combining tables manually is time-consuming. Luckily, SQL gives us an easy sequence for this: it’s called a JOIN.
If we want to combine cities and countries, we would type:
SELECT * -- [1]
FROM city -- [2]
join country -- [3]
on city.countrycode = country.code; -- [4]Let’s break down this command:
- Selects all columns from our combined table. If we only want to select certain columns, we can specify which ones we want.
- Specifies the first table that we want to look in,
city - Uses
JOINto say that we want to combine information fromcitywithcountry - The fourth line tells us how to combine the two tables. We want to match
citytable’scountrycodecolumn withcountrytable’scodecolumn.
Important
Because column names are often repeated across multiple tables, we use the syntax table_name.column_name to be sure that our requests for columns are unambiguous.
When we perform a simple JOIN (often called an inner join) our result only includes rows that match our ON condition. The result will not include the non-matching entries.
Note
There are multiple types of JOIN clauses, for example LEFT JOIN. We'll cover those later in the module.
It's your turn! Write a query to return country names, their population, and languages spoken? Hint: you need to use the country and countrylanguage table.
-- Retrieve country names, their population, and languages spoken (one language per row)
SELECT * from ...View solution
SELECT country.name, country.population, countrylanguage.language
FROM country
JOIN countrylanguage ON country.code = countrylanguage.countryCode
-- +-------------+------------+------------+
-- | name | population | language |
-- +-------------+------------+------------+
-- | Aruba | 103000 | Dutch |
-- | Aruba | 103000 | English |
-- | Aruba | 103000 | Papiamento |
-- | Aruba | 103000 | Spanish |
-- | Afghanistan | 22720000 | Balochi |
-- | Afghanistan | 22720000 | Dari |
-- | Afghanistan | 22720000 | Pashto |
-- | Afghanistan | 22720000 | Turkmenian |
-- | Afghanistan | 22720000 | Uzbek |
-- (...)
-- 984 rows in setWe just learned how to query data from a database using SQL from real life world data. We also learned how to filter queries to make the information more specific and useful.
In summary:
SELECTis the clause we use every time we want to query information from a database.ASrenames a column or table.DISTINCTreturn unique values.WHEREis a popular command that lets you filter the results of the query based on conditions that you specify.LIKEis a special operator that accepts wildcards ( for example%s, and_).ANDandORcombines multiple conditions.ORDER BYsorts the result.LIMITspecifies the maximum number of rows that the query will return.JOINwill combine rows from different tables if the join condition is true.
Feel free to experiment a bit more with the world database before moving on!
That's it for Week 1's theoretical content. Next, you should focus on the exercises for this week.
[1] Adapted from MariaDB's "Introduction to Relational Databases"
[2] Adapted from Codecademy's "Learn SQL" course
[3] Sample world database from MySQL / Oracle



