🗃️ Databases
SQL databases
- Databases
- Being More Selective
- Communicating with the database using SQL
- Communicating with the database using SQL
- Creating a table
- CRUD operations with NodeJS and PostgreSQL
- Databases with Google Sheets
- Defining Primary and Foreign Keys
- Inserting, Updating and Deleting Rows
- Integration with NodeJS
- Introduction to databases
- Introduction to PostgreSQL
- Joining Tables
- Mac-OS setup ⚙️
- Ubuntu setup ⚙️
- Using Aggregate Functions
Databases block viewer
This block viewer lets you flick through all the existing blocks in the Databases folder so you can choose what parts to add to your pages and what parts you might want to create, revise, or leave out.
It's literally just an alphabetical list of whatever is in this folder.
Being More Selective
Learning Objectives
Returning all the customers is fine for a small table (even the 130+ customers) but if we have a large number of rows, thousands or millions, then that approach is very clumsy.
If we know the value of the id
column for the customer we can retrieve just that one row (we know there is only one row because id
is the Primary Key).
Create a new endpoint to get the customer by using the id value:
app.get("/customers/:id", function (req, res) {
// TODO - add code here
});
Here the endpoint includes :id
, which identifies an extra parameter in the URL that will provide the id value, as in http://localhost:3000/customers/3
. We just need a way to get the value from the URL.
Express provides a simple way to get such parameters from the request:
const custId = parseInt(req.params.id);
Note that the id
in req.params.id
must match the name after the colon in the endpoint "customers/:id"
.
Next we need to query the customers
table and provide the value of the id into a WHERE
clause so that we retrieve only the one row that matches:
db.query("SELECT * FROM customers WHERE id = $1", [custId])
.then((result) => {
console.log(result);
})
.catch((err) => {
console.log(err);
});
In the above code notice that:
- The
SELECT
now uses aWHERE
clause - The value we are wanting to look for appears as
$1
- this is a placeholder for the ‘id’ value - The second parameter is the array
[custId]
that provides the id value
Let’s complete the endpoint to return the retrieved values:
app.get("/customers/:id", function (req, res) {
const custId = parseInt(req.params.id);
db.query("SELECT * FROM customers WHERE id = $1", [custId])
.then((result) => {
console.log(result.rows);
})
.catch((error) => {
console.log(error);
});
});
In db.query
you can use placeholders $1, $2, … $9, $10, $11, … etc to mark the place where a parameter value should be used. The parameters to replace the placeholders are supplied in the second argument, the array of values. In this case there is only one value (but it must still be put into an array) so we have [custId]
as the replacement value for the first placeholder, $1
. If there is more than one placeholder there must be the same number of array elements and they must be in the order of the placeholder numbers.
String Placeholders
With String placeholders you don’t put apostrophes around the placeholder:
app.get("/customers/by_city/:city", (req, res) => {
const cityName = req.params.city;
db.query("SELECT * FROM customers WHERE city LIKE $1 || '%'", [cityName])
.then(() => {})
.catch((err) => {});
});
Use Placeholders to Avoid SQL Injection
What is SQL Injection
If an end user can enter unrestricted text into a data field on a form there is the possibility that they could put a nasty SQL command inside the data. For example, if the field is used to enter the customer name the user could type:
J Doe'; delete from customers; select 'x' from customers
as the name value. It looks a bit unlikely but any user with a little bit of SQL knowledge could eventually work out the format needed.
If the code used string concatenation to for the final SQL command it could be something like:
const myQuery = `UPDATE customers SET name = '${inputName}' WHERE id = + ${ID}`;
Then myQuery
would become:
UPDATE customers SET name = 'J Doe'; DELETE FROM customers; SELECT 'x' FROM customers WHERE id = 123
(Note that semicolon can be used between SQL commands in this context)
By using placeholders instead of string concatenation we can prevent this kind of attack. This is VERY important in web-facing apps that use SQL.
Define and test endpoints (20m)
- Make sure you have defined the endpoints for:
- Getting all customers
- Getting a single customer by id
- Using a method similar to the one used to get a customer by id, define an endpoint that can get customers by matching part of the name (e.g. /customers/by-name/:name). Remember there may be more than one matching row!
- Test all these endpoints with Postman and ensure the results are as expected.
Communicating with the database using SQL
Learning Objectives
👩🏽✈️👨🏾✈️Code along with video mentors
The PSQL utility
We use SQL to perform actions on the database and initially we can use a terminal-like utility to do this. The utility is named psql
and is run using the command:
psql <dbname> <username>
The command prompt from psql is the database name currently connected:
my_hotel=>
In psql, you can use the command help
to show the help menu. Within the command prompt, you can enter SQL statements and run them against PostgreSQL. To quit psql, enter the command \q
.
Download the following file to a directory on your computer. This file creates the sample data you can use for the following sections. To do this, click the file to open it in a github formatted page, then right click the Raw button in the bar just above the code and select Save As (or Save Link As or similar) to save it:
Once you have the file downloaded to a known directory, execute the file build-hotel.sql
from psql
as shown below (replace /your/sql/path/
with the path to the download directory used above):
\include /your/sql/path/build-hotel.sql
Check that you have built all the required tables:
\dt
You should see a listing of your tables as follows (with your own username as owner):
List of relations
Schema | Name | Type | Owner
--------+---------------+-------+-------
public | customers | table | keith
public | invoices | table | keith
public | reservations | table | keith
public | room_types | table | keith
public | rooms | table | keith
(5 rows)
Here is the table diagram of the hotel database:
The SELECT Statement
We are first going to look at retrieving data from the database so we can examine it and later, use it in our applications.
To get data out of a table you use the SELECT statement (or command):
SELECT ... FROM ...;
For example:
SELECT name, phone, country FROM customers;
SQL commands entered in the psql command line tool are terminated with a semicolon (;). The SQL command can extend across several lines, but each keyword, name or value cannot be split over more than one line. For example:
SELECT name,
phone,
country
FROM
customers;
is the same as the previous example.
You can use SELECT * FROM ...
to return all the columns of the table. For example:
SELECT * FROM rooms;
This is also a useful command to see what columns exist in a table. You can also use the \d <table_name>
psql command to describe the table.
Note that the use of UPPER/lower case is only to emphasise and differentiate the SQL keywords (upper case) from the other names (lower case) e.g. column and table names. SQL keywords are not case-sensitive.
Scrolling the Results
When you issue a SELECT
that returns a lot of data psql
displays it one screenful at a time. To get the next screenful just press the Space key. You can also use the Up and Down arrow keys for line-by-line control and you can go back one screen with the ‘B’ key.
When you have finished reading the output use the ‘Q’ key to quit the display manager.
Space Next screenful
'B' Previous screenful
Down Arrow Next line
Up Arrow Previous line
'Q' Quit back to prompt
Test Your Skills
Some Useful psql
Commands
The psql
commands are not SQL and are specific to PostgreSQL (although most other RDBMS’s have commands to perform similar jobs). These commands let you display information, execute system commands, etc. Use \? to display a summary of all the psql
commands.
Display a list of available tables in the database:
\dt
Display the definition of a table:
\d <table name>
Display help for SQL commands:
\h [command]
Display a summary of the psql (backslash) commands:
\?
Exit (quit) from psql:
\q
Note that psql
commands ARE case sensitive, unlike SQL commands.
Test Your Skills
- Display the definition of the
customers
table - Display the help for the SELECT command (Note: we will not be covering ALL of this syntax!)
- Read the psql command help and find out what \dS does then try it
Displaying More Than Just Columns
You can use expressions in SQL:
SELECT room_no, rate * 0.85 FROM rooms;
+---------+-------------+
| room_no | rate * 0.85 |
+---------+-------------+
| 101 | 72.2500 |
| 102 | 72.2500 |
| 103 | 72.2500 |
...
Use a column alias to give the expression a meaningful name:
SELECT room_no,
rate * 0.85 AS discounted_rate
FROM rooms;
+---------+-----------------+
| room_no | discounted_rate |
+---------+-----------------+
| 101 | 72.2500 |
| 102 | 72.2500 |
| 103 | 72.2500 |
Here, the query uses the alias as the column heading. Aliases can also be used in other contexts - more on this later…
Expressions in SQL
As with Javascript you can use a variety of ‘operators’ to define expressions in SQL.
Arithmetic:
* Multiply
/ Divide
+ Add
- Subtract
% Modulo (remainder)
(...) Parentheses (to override precedence)
String:
|| Concatenation
For example, to display the weekly rate for a room (with 10% weekly discount):
SELECT room_no, room_type, rate * 7 * 0.90 from rooms;
You can change the column heading using a column alias:
SELECT room_no, room_type, rate * 7 * 0.90 as weekly_rate from rooms;
Use string concatenation to glue character data together:
SELECT 'Customer name = ' || name FROM customers;
Choosing the Rows
You can choose which rows to display by specifying some condition that must be matched:
SELECT id, name, phone, email, country
FROM customers
WHERE country = 'France';
id | name | phone | email | country
-----+--------------------+------------------+-----------------------------+---------
9 | Laurence Lebihan | 91.24.4555 | laurence.lebihan@xmzx.net | France
12 | Carine Schmitt | 40.32.2555 | carine.schmitt@dftu.net | France
15 | Janine Labrune | 40.67.8555 | janine.labrune@dlsh.net | France
25 | Mary Saveley | 78.32.5555 | mary.saveley@yppl.net | France
34 | Martine Rancé | 20.16.1555 | martine.rancé@xeqs.net | France
35 | Marie Bertrand | (1) 42.34.2555 | marie.bertrand@glut.net | France
49 | Frédérique Citeaux | 88.60.1555 | frédérique.citeaux@vekn.net | France
59 | Annette Roulet | 61.77.6555 | annette.roulet@lgha.net | France
62 | Daniel Da Silva | +33 1 46 62 7555 | daniel.da.silva@hijy.net | France
63 | Daniel Tonini | 30.59.8555 | daniel.tonini@mxvw.net | France
91 | Laurence Lebihan | 91.24.4555 | laurence.lebihan@xmzx.net | France
92 | Paul Henriot | 26.47.1555 | paul.henriot@uwua.net | France
106 | Dominique Perrier | (1) 47.55.6555 | dominique.perrier@bdim.net | France
(13 rows)
You can use comparison operators =
, <
, >
, <=
, >=
, !=
(or <>
)
Note: use only one = (equals) symbol to test for equality
When comparing numbers no punctuation is needed around the value, for example, WHERE rate > 100
.
When comparing character data or dates you must enclose the values in single quotes (apostrophes), for example, WHERE name = 'Mary Saveley'
.
Only the rows that match the comparison test (called a predicate) are returned by the query. The predicate can use columns not returned by the query,
Combining Tests in a Predicate
Use AND and OR to combine tests:
SELECT * FROM reservations
WHERE room_no >= 200
AND room_no < 300
AND checkin_date >= '2018-01-01';
This lists reservations for rooms on the second floor (rooms 200 - 299) since the start of 2018. Note the format of the date value - this conforms to the ISO 8601 standard and should be used in preference to any other format to avoid ambiguity.
Another example - to find cheap or Premier rooms on floors 1 and 2 - we might try this to start with:
SELECT * FROM rooms
WHERE room_type = 'PREMIER'
OR rate < 100.00
AND room_no < 300;
This isn’t quite right - it returns rooms on the 3rd and 4th floors. Why?
Overriding Evaluation Order
Just like any programming language, SQL has an evaluation order (precedence). For example, multiply and divide take precedence over add and subtract, so that:
SELECT rate + 20 * 0.85 from rooms;
is not the same as:
SELECT (rate + 20) * 0.85 from rooms;
We can override the normal precedence by using parentheses (...)
around parts of the expression, just as in JavaScript.
With compound predicates AND takes precedence over OR, so that to make the query give the intended results we need to use:
SELECT * FROM rooms
WHERE (room_type = 'PREMIER'
OR rate < 100.00)
AND room_no < 300;
More Predicate Types
The BETWEEN operator has the form a BETWEEN b AND c
: checks that a is in the range b - c inclusive. For example:
SELECT ... WHERE price BETWEEN 100 AND 250 ...
Note that the AND in this case is not combining multiple predicates, it’s part of the BETWEEN operator.
The IN operator, a IN (b, c, d, ...)
checks if the value of a is equal to any of b, c, d, etc… For example:
SELECT ... WHERE room_no IN (201, 202, 204, 206) ...
Both the BETWEEN and the IN operators can be inverted using:
... a NOT BETWEEN b AND c ...
... a NOT IN (b, c, d, ...)
The LIKE operator tests for a match against a wildcard string as a LIKE b
where a is being tested and b is the wildcard string. The wildcard string contains text to be matched along with wildcard symbols ‘%’ and ‘_’.
%
(percent) matches any number of any characters_
(underscore) matches exactly one of any character
For example:
name LIKE 'A%'
matches names starting with ‘A’
name LIKE '_a%'
matches names that have ‘a’ as the 2nd character (note the initial underscore ‘_’)
name LIKE '%ow%'
matches names containing the sequence ‘ow’ anywhere in the name
LIKE can be inverted using a NOT LIKE b
If you need to match for a string that includes one of the wildard characters you can use the ’escape’ character, which defaults to ‘\’ (backslash). For example:
str LIKE '% discount = 5\% %'
matches any value in str that contains ‘discount = 5%’
LIKE is case sensitive in many SQL implementations so to make a case insensitive match you should either convert the tested value to either all upper or all lower case, for example:
lower(name) LIKE '%b%'
matches any name that contains the letters B or b
Note: PostgreSQL also has the non-standard operator ILIKE that can perform a case-insensitive comparison - but avoid this to make code more portable.
Test Your Skills
- Which customers are from Norway?
- Which rooms can accommodate more than two people?
- Which invoices are dated after one month ago?
- How would last month’s invoices change if we gave a discount of 15%
- List all customers whose second name starts with ‘M’ (hint: there’s a space before the second name)
Using SQL Functions
You can use the built-in functions of SQL just as you can in JavaScript, but note that they are different (this is true of most programming languages) but there are also differences between SQL implementations.
You use functions to change values, usually of columns, wherever you can use a column, for example, in the selected list of values:
SELECT name, length(name) AS namelen, upper(email)
FROM customers;
This query also uses a column alias (namelen) to provide a meaningful column heading.
Functions are available that operate on all different datatypes.
Country names are mixed case so to make sure we always match regardless of the stored case we can use the lower
function to find all customers from Manchester, UK:
SELECT * FROM customers
WHERE lower(country) = 'uk'
AND city = 'Manchester';
Assuming room rates include VAT at 20%, list room rates after VAT increases to 23.5% (from 20%), but round to the nearest pound:
SELECT room_no, room_type, rate AS old_rate,
round(rate * 100/120 * 123.5/100) AS new_rate
FROM rooms;
For further information on SQL functions see the official PostgreSQL documentation at https://www.postgresql.org/docs/12/functions.html (for version 12 - for other versions change 12 to the required version)
Date and Time in SQL
In SQL dates and times are held in an internal format but are represented externally (when entering values and displaying them) as strings;
- Text date format: ‘YYYY-MM-DD’ e.g. ‘2018-07-21’ = 21 July 2018
- Time format: ‘HH:mm:SS.ddd’ e.g. ‘14:32’
- Date/Time format: ‘YYYY-MM-DD HH:mm:SS.ddd’ e.g. ‘2018-07-21 15:26:04’
You can perform arithmetic on dates and times, for example:
SELECT cust_id, room_no, checkin_date,
checkout_date - checkin_date AS nights
FROM reservations
WHERE checkout_date = current_date + 1;
This query performs subtraction of one date from another (checkout_date - checkin_date
) to calculate the number of nights the customer has stayed. It also performs addition (current_date + 1
) to get tomorrow’s date so that it lists all reservations that will be checking out tomorrow.
Note: current_date
is a postgres function that returns the current date.
Also note that there are many ways to get the same result - you may explore those for yourself.
You can also represent time intervals but the representations can be complicated and we shall not cover them here.
Test Your Skills
- Write a query to check that all booking dates are before their checkin dates
- We plan to offer a discount of 10% on all Premier and Premier Plus rooms next month. How much would we gain on each room if occupancy rose by 5 nights over the month.
- List all reservations for this month and the number of nights booked.
Eliminating Duplicates
“Which nationalities visit our hotel?”:
SELECT country FROM customers;
But how many values do you see returned for each country? If two customers come from a particular country that country will appear twice in the output. If more than two come from the same country then… But we only need to know the different countries.
To see each country only once, use the keyword DISTINCT, as follows:
SELECT DISTINCT country FROM customers;
The keyword DISTINCT must appear immediately after the keyword SELECT. If more than one column is selected then DISTINCT applies to the combined values of those columns.
Ordering the Returned Rows
If you want to see the data in a specific order, e.g. “List all customers alphabetically by name within each country”:
SELECT id, name, phone, email, country
FROM customers
ORDER BY country, name;
You can can add ASC (ascending, the default) or DESC (descending) after each column name in the ORDER BY clause to control the direction of sorting.
For example:
SELECT id, name, country, city
FROM customers
ORDER BY country DESC, city;
This will sort the data into descending alphabetic order of country then ascending order of city name within each country. The output will look something like this:
id | name | country | city
-----+-------------------------+--------------+-------------------
28 | Kelvin Leong | USA | Allentown
96 | Juri Yoshido | USA | Boston
132 | Valarie Franco | USA | Boston
100 | Allen Nelson | USA | Brickhaven
46 | Miguel Barajas | USA | Brickhaven
43 | Leslie Taylor | USA | Brickhaven
37 | Julie King | USA | Bridgewater
130 | Sue Taylor | USA | Brisbane
124 | Steve Thompson | USA | Burbank
29 | Juri Hashimoto | USA | Burlingame
36 | Jerry Tseng | USA | Cambridge
70 | Marta Hernandez | USA | Cambridge
112 | Dan Lewis | USA | Glendale
52 | Mary Young | USA | Glendale
13 | Jean King | USA | Las Vegas
89 | Brian Chandler | USA | Los Angeles
97 | Dorothy Young | USA | Nashua
83 | William Brown | USA | Newark
120 | Violeta Benitez | USA | New Bedford
79 | Wing Huang | USA | New Bedford
116 | Leslie Murphy | USA | New Haven
. . .
Note: you can order by columns that are not returned by the query.
Limiting the Number of Rows
You can reduce the number of rows returned by using the LIMIT clause at the end of the query:
SELECT id, name, phone, email, country
FROM customers
ORDER BY country, name
LIMIT 20;
The LIMIT clause is not normally used without the ORDER BY clause - without the ORDER BY clause rows can be returned in any arbitrary sequence.
Not all SQL implementations of SQL support LIMIT, some use TOP while Oracle uses ROWNUM.
Test Your Skills
- List the different room types and rates for all rooms avoiding duplicates.
- List customers’ names addresses and phone numbers in alphabetic order of names.
- List customers’ names, addresses, city and country in ascending order of country then reverse order of city within country.
- List the room number, type and the cost of staying 5 nights in each of the top 15 most expensive rooms.
Summary
In this lesson you have learned the use of databases and how relational databases are structured. You’ve also learned how to use basic single-table query commands in SQL and some of the special ‘backslash’ commands in psql
. You have used the SELECT command to control the columns and values that are returned, the DISTINCT, ORDER BY and LIMIT clauses to control the order and numbers of rows returned and you’ve used the WHERE clause to choose the rows that you access. You have learned the INSERT command to add new data to the database
Next time we shall go on to more complex query constructs including joins, updates and deletes along with incorporating SQL into a node.js server.
Communicating with the database using SQL
Learning Objectives
Use the tables and data from your The Docs_hotel
database. If you need to start from a clean state for your database, run psql -d The Docs_hotel -f build_hotel.sql
.
Earlier, we created a new NodeJS project called The Docs-hotels-api
with a single API endpoint /customers
to get the list of all customers. Now, we will add other endpoints with more functionalities to interact with the The Docs_hotel
database.
You should have a server.js
file that looks something like this:
const express = require("express");
const app = express();
const { Pool } = require("pg");
const db = new Pool({
user: "keith",
host: "localhost",
database: "The Docs_hotel",
password: "",
port: 5432,
});
app.get("/customers", function (req, res) {
db.query("SELECT * FROM customers")
.then((result) => {
res.status(200).json({ customers: result.rows });
})
.catch((err) => {
console.log(err);
});
});
app.listen(3000, function () {
console.log("Server is listening on port 3000. Ready to accept requests!");
});
Before we move on, run the server.js
file in node and check the endpoint using your browser.
Using Postman to Test an API
You should have Postman already installed on your system. If not then please install it now.
Run Postman (it can be slow to start).
Check the method is GET, set the URL for your customers endpoint then click Send.
Creating a table
Learning Objectives
Use the CREATE TABLE command, which in the simplest case has the general form:
CREATE TABLE <tablename> <column definition>, <column definition>, ...);
To create an inventory
table for our hotel we might need:
CREATE TABLE inventory (
id SERIAL PRIMARY KEY,
description VARCHAR(30) NOT NULL,
cost NUMERIC(6,2)
);
Note: you may never need to do this. Database design is a task that requires specialist skills and considerable experience.
Naming Tables and Columns
In the CREATE TABLE
command you must give the name of the table (e.g. inventory
) and the names of each of the columns (in the parenthesised column definitions) (e.g. id
, description
, cost
).
Names of tables and columns (and any other objects in the database) must start with a letter, can contain letters, digits and the underscore symbol (_) up to 64 bytes (in PostgreSQL). Names are not case-sensitive so that NAME, name and NaMe are all the same.
Data Types of Columns
In the above example:
Column | Data Type | Other |
---|---|---|
id | SERIAL | PRIMARY KEY |
description | VARCHAR(30) | NOT NULL |
cost | NUMERIC(6,2) |
The id
column uses SERIAL as its data type, making it an autoincrementing integer that increments by 1, starting from 1, for each time a new row is inserted into the table. For this to work, the id
column must be omitted from the INSERT command. id
is also designated as the PRIMARY KEY of the table (note that SERIAL doesn’t make the column the primary key). PRIMARY KEY also implies that the column cannot be set to NULL.
The description
column is a variable length character value (VARCHAR) that can hold up to a maximum of 30 characters. The NOT NULL constraint means the value cannot be left empty, each row must have a description.
The cost
column is NUMERIC(6,2), a number that can accurately store up to 6 digits, two of which are the fractional part. For example, it can hold 1234.56, -23.45 or 0.01. Note that the NUMERIC data type stores and computes values using decimal values and does not lose accuracy in the same was as, say, floating point values. NUMERIC values take longer in calculations because they don’t use simple binary values - user either integer or floating point for speed with compute-heavy numbers.
NEVER use floating point for financial values.
Other Common Data Types
There are several more standard data types (plus a few non-standard ones), including:
Type | Notes |
---|---|
INTEGER | binary integer with 32 bits (range approx -2 x 109 – +2 x 109) |
DATE | dates with no time component |
TIMESTAMP | date and time (accurate to milliseconds) |
BOOLEAN | TRUE, FALSE or NULL |
TEXT | variable length text with no length limit (up to max allowed for the RDBMS - about 1Gb in PostgreSQL) |
You can read more about data types in the PostgreSQL documentation. Refer to https://www.postgresql.org/docs/12/datatype.html for more information.
Changing a Table Definition
Using the ALTER TABLE command to add and remove columns:
ALTER TABLE inventory ADD COLUMN room_no INTEGER;
ALTER TABLE customers DROP COLUMN phone;
There are some constraints on adding and removing columns, for example, you cannot add a NOT NULL column to a table that already contains some rows.
Exercise 7
- Create a table for charge points. This must record the hotel shops, bars, cafes and restaurants that a customer can use during their stay.
- Include an auto-incrementing primary key
- Include the charge point name, a description and maximum customer credit value
- Insert charge points for ‘Shop’, ‘Pool Bar’, ‘Elysium Restaurant’ and ‘Room Service’ with credit limits of £1000 for each.
- Create a table for charge items that records amounts charged to rooms by customers using our shop, bars, restaurants, etc. This must include the room number of the room charged, the charge point used, the amount, the date and time of the charge and any customer comments.
CRUD operations with NodeJS and PostgreSQL
Learning Objectives
👩🏽✈️👨🏾✈️Code along with video mentors
💾 CRUD operations
🌱 Creating data
In the following, we will add a new API endpoint to create a new customer in the table customers
of the The Docs_hotel
database. As a reminder, here is an example of a SQL insert statement to add a new customer:
INSERT INTO customers (name, email, phone, address, city, postcode, country)
VALUES ('Fred Bloggs', 'fred@bloggs.org', '07123456789', '1 Low Brow',
'Ashbottom', 'XY2 3ZA', 'UK');
Using body-parser for Form Data
When an endpoint can reasonably expect a large number of data values, for example, when we need to insert a new row into the customers
table, then using parameters in the URL is not practical. Instead we can use a ‘middleware’ package such as body-parser
, which can extract data from the body of the request.
As we create a new record in the database, we will add a new POST endpoint in the The Docs-hotels-api
project from last class. Moreover, we need to be able to pass data to this API endpoint such as the name, address, postcode, etcetera, so we can use this API to create different customers. The data can be sent in the body of the request. To access the parameters in the body of the request with Express.JS, we need to add the module body-parser
to the The Docs-hotels-api
project:
At the terminal command line:
npm install --save body-parser
Then include it in the server.js
:
const bodyParser = require("body-parser");
app.use(bodyParser.json());
📩 Inserting Rows Using Node.js
We can finally make an endpoint to create a new customer:
app.post("/customers", function (req, res) {
const newName = req.body.name;
const newEmail = req.body.email;
const newPhone = req.body.phone;
...
const newCountry = req.body.country;
const query =
`INSERT INTO customers (name, email, phone, address, city, postcode, country)
VALUES ($1, $2, $3, $4, $5, $6, $7)`;
db.query(query, [newName, newEmail, ..., newCountry])
.then(() => {
res.status(201).send("Created a new customer");
})
.catch(err => {
console.log(err);
})
});
📮 Using Postman to Test a POST
In Postman:
- Change the method to POST
- Ensure the URL address is your customers endpoint, for example: http://localhost:3000/customers
- Under the address in the Postman window select Body then select raw
- Ensure that JSON is selected at the right hand end
- Enter the data as JSON in the panel below these settings, for example:
{ "name": "Fred Bloggs", "email": "fred@bloggs.org", "phone": "07123456789" }
- Click send
You should see the response “Customer created.”
Exercise 2
- Install body-parser and enable it in your server.js
- Define a new endpoint to create new customer records For simplicity, only insert the name, phone & email values. Invent your own values.
- After successfully inserting the row send back a message, “New customer added.”
- Test your endpoint using Postman
- Check the data has appeared in the table
What Can We Do After an Insert?
- We must start checking for errors in the execution of the SQL. This applies to all commands, not just INSERT.
- If the table has an auto-incrementing primary key we can obtain the value for the new row and return it to the browser. This is often required for subsequent actions in the use case.
- The new data can be logged to the console (although this is not common practice).
- … and so forth …
🧪 Checking for Errors in SQL
The catch block in db.query
always has an error parameter. If the passed SQL query throws an error then the callback function passed to the catch
method is invoked with an error object generated by node postgres.
A typical situation might be:
db
.query(...);
.then(/* do things for success*/);
.catch((err) => res.status(500).json({error: err}));
👩🏽🔬 Validating
What could go wrong with our code so far? There is no validation of any user inputs which could result in errors, duplications or inconsistent data in the database! Here are few examples of things we can verify before creating the customer in the database:
- Check that the phone number contains only digits, ‘+’, ‘-’, ‘(’ or ‘)’
- Check that no customer with the same email already exists in the database
- …
Let’s start by validating that phone number the consists of only digits, +, -, (, ) or space and if it doesn’t, return an error. This check uses a regular expression to replace all valid characters with a ‘0’ (zero) symbol then check the result is made up of only ‘0’ symbols.
if (
newCustPhone.replace(/[+\-()0-9 ]/g, "0") != // replace all valid chars with 0
"0".padEnd(newCustPhone.length, "0")
) {
// compare with all zeros same length as phone
return res
.status(400)
.send("The phone number must only contain 0-9, +, -, (, ) or space.");
}
Note: regular expressions may be new to you but they are very powerful and also quite complicated. We don’t have time to teach you the intricacies of “regex” here but you can find numerous resources on the web and in the MDN documentation.
As you have already learned, this validation can also be performed in the browser because it doesn’t involve any database interaction. This is a faster and less costly (in terms of network traffic) approach and should be used where practical. It makes user correction of mistakes quicker and more natural and can be done on a per item basis.
Exercise 3
- Modify your endpoint to check for SQL errors. If an error is encountered then console.log it and return it to the browser.
- Add phone number validation to your endpoint to check that only digits, +, -, (, ) or space are present. Any other characters should send a suitable message back to the browser and abort the insert.
(You may use copy & paste to get the regular expression - it’s not easy to get right by hand.) - Test your endpoint with a variety of data including valid and invalid phone numbers.
📨 Check the Customer’s Email
We can also validate the new customer’s email doesn’t already exist in the database, to prevent duplicate data.
app.post("/customers", function (req, res) {
const newName = req.body.name;
const newEmail = req.body.email;
const newPhone = req.body.phone;
// et cetera ...
//
// Validate the new customer's email address by querying the existing customers table
// to return any rows that contain the same values
//
db.query("SELECT 1 FROM customers WHERE email=$1", [newEmail])
.then((result) => {
if (result.rowCount > 0) { // note the use of result.rowCount
return Promise.reject({error: 'Customer already exists'})
} else {
return db.query(`INSERT INTO customers (name, email, phone, address, city, postcode, country)
VALUES ($1, $2, $3, $4, $5, $6, $7)`, [newName, newEmail, ..., newCountry],
}
})
.then(() => {
res.status(201).send("Customer created")
})
.catch((error) => {
// any errors or manual rejections will be intercepted here
// handle error here
})
Note:
- The
SELECT
does not choose any column values, just the literal1
. We are only interested in the number of rows found (it must be 0 for success). - If a row is found, then a customer with this email exists so we call
Promise.reject
: this will then trigger the firstcatch
block in the promise chain where we can handle the error. - If a row is not found, then we perform the insertion query, taking care to return the call to
db.query
so we can access its resolve value in the next.then
block - This validation is NOT suitable for performing in the browser because it interacts with the database.
Exercise 4
- Before adding a new customer ensure there are none with the same email address in the customers table
- If a duplicate email is found send an appropriate message to the browser
- Use Postman to check that your new code works as expected by trying to insert duplicate emails as well as correct ones
- Check the data has been added to the database (use psql)
🗝️ Return the Primary Key Value for an Insert
If the table uses an auto-incrementing primary key then it is very useful to return the generated value after the INSERT
. PostgreSQL provides a very simple mechanism to do this (as do most RDBMS’s) by appending the RETURNING
clause to the INSERT
, as below:
INSERT INTO customers (name, email, phone)
VALUES ('Fred Bloggs', 'fred@bloggs.org', '07123456789')
RETURNING id;
The endpoint now uses the query call shown below:
db.query(`INSERT INTO customers (name, email, phone, address, city, postcode, country)
VALUES ($1, $2, $3, $4, $5, $6, $7)
RETURNING id`, [newName, newEmail, ..., newCountry])
.then(result => {
const newId = result.rows[0].id;
console.log(`New Customer id = ${newId}`);
res.status(200).json({lastId: newId});
})
.catch(err => {
res.status(500).json({error: err});
})
Exercise 5
- Further extend your
POST
endpoint so that it returns the new customer id value to the browser. - Use Postman to check that the new value is returned.
🆕 Updating data
We can now implement an endpoint to update a customer record in the database. For this, we will use a PUT endpoint.
app.put("/customers/:id", function (req, res) {
const custId = req.params.id;
const newEmail = req.body.email;
const newPhone = req.body.phone;
db.query("UPDATE customers SET email=$2, phone = $3 WHERE id=$1", [
custId,
newEmail,
newPhone,
])
.then(() => res.send(`Customer ${customerId} updated!`))
.catch((err) => {
console.error(err);
res.status(500).json({ error: err });
});
});
Note:
- This endpoint uses both a URL parameter (
:id
) to supply the customer id and data in the request body to provide the updated column values. - The order of the placeholders does not have to correspond to their sequence in the SQL. In this case we’ve put the
custId
value first but it appears last in the SQL (in theWHERE
clause).
We used URL parameters to query for a specific customer earlier and we used body data to insert a new customer.
What can go wrong in the code above? Again, there is no validation! The user could try to set an empty email or even a string which is not following the format of an email. Remember, validating data is very important to make sure you don’t end up with inconsistent data in your database!
Exercise 6
- Create a new endpoint to update reservations to provide the room number allocated to the guest when they check in. Use the reservation id value to identify the row.
- Use Postman to check that the endpoint works correctly and the row is updated.
🗑️ Deleting data
To delete a record from the database, we will use a DELETE
endpoint:
app.delete("/customers/:id", function (req, res) {
const custId = req.params.id;
db.query("DELETE FROM customers WHERE id=$1", [custId])
.then(() => res.send(`Customer ${custId} deleted!`))
.catch((err) => console.error(err));
});
However, if you try to delete a customer which already has some bookings, the previous endpoint will fail. Do you know why? You cannot delete a customer whose ID is used as a foreign key in another table (in this case, in the reservations
table). Let’s delete all the customer bookings first:
app.delete("/customers/:customerId", function (req, res) {
const customerId = req.params.customerId;
db.query("DELETE FROM reservations WHERE cust_id=$1", [customerId])
.then(() => {
return db.query("DELETE FROM customers WHERE id=$1", [customerId]);
})
.then(() => res.send(`Customer ${customerId} deleted!`))
.catch((e) => console.error(e));
});
Exercise 7
- Add an endpoint to delete from the reservations table using the value of the id to choose the row to be removed. Only allow deletion of reservations with future checking dates otherwise return an error.
- Check the endpoint works correctly using Postman (use psql to verify the row has been deleted).
⚠️ Warning!
⚠️ Update & Delete are not Safe
Optimistic Locking
The solution is known as “optimistic locking” but we just don’t have time in this course to cover it. So long as you are aware of the problem you can use the approach but add plenty of comments to make clear that the code is not yet safe.
Note also that optimistic locking requires changes not only to the code of your Node endpoints but also to the behaviour of the web pages that use and modify the data.
Databases with Google Sheets
Learning Objectives
Introduction to databases, using Google Sheets
What is a database?
A database is a structured set of data held in a computer. It stores, retrieves and organizes information.
You already use databases in your everyday life! A few examples:
- The messages you send on Slack are stored in a database.
- CYF stores your name and email on a database!
- Your phone contact list? That will be in a database too!
What else you do on your computer and phone? What information is there? And where do you think it’s stored? Write down five other databases you get information from.
How do they work?
Most databases arrange data into columns and rows.
column 1 | column 1 |
---|---|
row 1 | row 1 |
row 2 | row 2 |
A row is a single record of this information which is grouped together, for instance, information about one person.
Within a row, each column stores a property of that record. In the example of your phone’s contact list, this would be name and number. Each row (or record) has a name and a number. What other properties does your address book store? Open your phone and take a look.
First Name | Surname | Mobile Number |
---|---|---|
Tom | Jones | +447700000 |
Ali | Smith | +327700000 |
Tom and Ali’s details here are records in the database. The properties are listed as columns and the values are in rows. Each row is a new record.
If you’ve used tools such as Excel or Google Sheets, you will have seen this sort of structure before. We’ll take a closer look at these in the backlog exercises for this module. You’ll learn how to manipulate data in Google Sheets and use SQL syntax to retrieve information from databases.
Defining Primary and Foreign Keys
Learning Objectives
Defining Primary and Foreign Keys
Defining Primary Keys
Use the following templates to define a Primary Key.
For a single-column PK use:
CREATE TABLE <table name> (
...
<column name> <data type> PRIMARY KEY,
...
)
For example:
CREATE TABLE rooms (
room_no INTEGER PRIMARY KEY,
...
);
To define a multi-column primary key you must define a constraint separate from the column definitions, as below:
CREATE TABLE <table name> (
...
<pk col 1> <data type>,
<pk col 2> <data type>,
... ,
PRIMARY KEY (<pk col 1>, <pk col 2>),
...
);
For example:
CREATE TABLE invoice_items (
inv_id INTEGER REFERENCES invoices(id),
item_no INTEGER,
... ,
PRIMARY KEY (inv_id, item_no),
...
);
There can be only one primary key in a table definition. The PRIMARY KEY
definition implies NOT NULL so no column in a table’s PK can be set to NULL.
Note: a partial primary key can be a foreign key as well.
Defining Foreign Keys
To define foreign keys use either:
For a single-column foreign key:
<column name> <data type> REFERENCES <table name> (<column name>);
where the <column name> in the REFERENCES clause is the column name in the referenced table, not the one being defined at this point. For example, in the reservations
table:
...
cust_id INTEGER NOT NULL REFERENCES customers (id),
...
For multi-column foreign keys we must again use a separate constraint definition, as shown:
CREATE TABLE customer_challenges (
id SERIAL PRIMARY KEY,
inv_id INTEGER,
item_no INTEGER,
...
FOREIGN KEY (inv_id, item_no) REFERENCES invoice_items (inv_id, item_no),
...
);
Exercise 8
- Try to delete the customer Mary Saveley. What happens and why?
- Insert a new room, number 313 as room type ‘SUPER PREMIER’.
- Define primary and foreign keys required by the charge_items table
- Insert some rows into the charge_items table. You can invent the details.
Inserting, Updating and Deleting Rows
Learning Objectives
Inserting data
To add new data to a table use the INSERT command that has the following format:
INSERT INTO table_name (column_name, ...)
VALUES (value, ...)
For example:
INSERT INTO customers (name, email, address, city, postcode, country)
VALUES ('John Smith','j.smith@johnsmith.org',
'11 New Road','Liverpool','L10 2AB','UK');
Note:
- You do not need to supply the value for the automatically generated
id
column, it is populated from a sequence generator object. - The order of values in the
VALUES (...)
clause must correspond to the columns in the column name list. The first value is stored in the first named column, the second value in the second named column and so forth.
Hotel 3
- Insert yourself in the
customers
table. Query the table to check your new data. - Insert a new room type of PENTHOUSE with a default rate of 185.00.
Updating Existing Data
When you need to change values in a table, use the UPDATE
command. The general construction to update a row is:
UPDATE table
SET column1 = value1,
column2 = value2
WHERE condition;
Note that UPDATE
usually requires a WHERE
clause to specify the row or rows to be updated. As with SELECT
, if you don’t specify a condition to restrict the rows, the command applies to all the rows in the table.
For example, to update the name and country of the customer with ID 3:
UPDATE customers
SET name='Bob Marley',
country='Jamaica'
WHERE id=3;
Hotel 4
- Update the postcode of the customer named
Alice Evans
toM21 8UP
- Update room 107 to allow up to 3 guests
- For the customer named
Nadia Sethuraman
, update her address to2 Blue Street
, her city toGlasgow
and her postcode toG12 1AB
in one query - Update all the future bookings of customer with ID 96 to 3 nights (starting on the same check-in date) in one query
Deleting a row
The syntax to delete a row is:
DELETE FROM table WHERE condition;
For example, to delete the booking with ID 4:
DELETE FROM reservations WHERE id=4;
💡 NOTE
WHERE
clause with DELETE
or UPDATE
the command will be applied to all the rows in the table which is rarely what you want.Hotel 5
- Delete the bookings of customer ID
108
that do not have a room number assigned - Delete all the bookings of customer Juri Yoshido (customer id 96)
- Delete the customer details for Juri Yoshido
Integration with NodeJS
Learning Objectives
Introduction to node-postgres
“node-postgres is a collection of node.js modules for interfacing with your PostgreSQL database.” - https://node-postgres.com/
In the following, we will use node-postgres to…
- Connect to a database
- Send SQL queries to the database and get results
Loading data from a database with a GET endpoint
Let’s build a brand new NodeJS application with a single GET endpoint to load the list of customers that you already have in the customers
table of the The Docs_hotels
database.
First, create a new NodeJS application that we will call The Docs-hotels-api (enter server.js
when asked for the entry point):
mkdir The Docs-hotels-api
cd The Docs-hotels-api
npm init
As before, we will use the Express library to build our API, and the node-postgres library to connect with our database:
npm install --save express
npm install --save pg
Create a server.js
file, import express, initialise the server and start listening for requests:
const express = require("express");
const app = express();
app.listen(3000, function () {
console.log("Server is listening on port 3000. Ready to accept requests!");
});
Import pg library and create a new GET endpoint to load the list of customers:
const { Pool } = require("pg");
const db = new Pool({
user: "keith", // replace with you username
host: "localhost",
database: "The Docs_hotels",
password: "",
port: 5432,
});
app.get("/customers", function (req, res) {
db.query("SELECT id, name, city, phone FROM customers")
.then((result) => {
res.json(result.rows);
})
.catch((error) => {
console.log(error);
});
});
In the code above:
- We first import the
Pool
class from the pg library, which is used to connect to a database - We create a new connection (
db
) where we specify the credentials to connect to the The Docs_hotel database - We then create a new
/customers
endpoint where we use the methodquery()
to send a SQL query to load all the customers from the tablecustomers
and return the results withresult.rows
. Thequery
method returns a Promise: so we can access the returned rows using a.then
block. You can write any valid SQL query that you learned in thequery()
method!
Note: There is no semicolon to end the SQL statement.
Start your server with node server.js
and try to reach the /customers
endpoint to see the list of customers currently available in your The Docs_hotels
database. You can try to create/update/delete customers using psql to verify that your API always returns what is stored in your database.
Introduction to databases
Learning Objectives
A database is a structured set of data held in a computer. It provides ways to store, retrieve and organize information.
Why do we need them?
In the past few weeks, you stored and retrieved data using files. This is fine for simple data but it can quickly become an issue as your application becomes more complex and needs to store and manipulate more complicated data. For example, imagine you are asked to develop a booking system for a medium-sized hotel. You will need to store the list of rooms available for booking, and as you add more features, you will need to save users information, the reviews they post, the bookings each user makes and payment information. You can see that the data you need to handle can become very complicated, especially when you need to consider that data are not static, as they can be updated or deleted. To work more effectively with data, we can then use a database, which presents the following benefits:
- A database defines a structure for your data and the relationships between entities
- A database provides convenient and performant ways to safely store and retrieve data
- A database provides a mechanism to check the validity of your data
Different types of database
There are many different kinds of database and different implementations. Sometimes, a database type is a better fit to certain use case or certain problems. The most well-known database types include relational database, key/value database, graph database and document database (also known as NoSQL). For this class, we will focus specifically on relational database as they are the most widely used and supported. You can consult DB-Engines to see a ranking of the most used database, as you can see, there are a lot of them!
Introduction to PostgreSQL
Learning Objectives
“PostgreSQL is a powerful, open source object-relational database system that uses and extends the SQL language combined with many features that safely store and scale the most complicated data workloads. The origins of PostgreSQL date back to 1986 as part of the POSTGRES project at the University of California at Berkeley and has more than 30 years of active development on the core platform.” (source: postgresql.org)
What is SQL?
- Pronounced S-Q-L or sequel
- Stands for Structured Query Language
- SQL is the standard language used to communicate with relational database
- SQL statements are used to query, create, update, delete records in a database as well as many other tasks
- SQL statements are executed by a RDBMS.
What is an RDBMS?
- Stands for Relational Database Management System
- It is a program that processes SQL statements to manage a relational database
- PostgreSQL is an RDBMS.
What characterizes a relational database?
As mentioned previously, a relational database is a specific type of database. Data is stored in tables (relations) of rows (tuples) and columns (attributes) as per the example below:
Joining Tables
Learning Objectives
Introduction
So far we’ve only looked at one table in any query. Many problems require data from several tables - how do we do that?
For example, if I want to phone or email customers who have not yet paid their invoices, which tables do I need to look at?
Use joins to combine data from more than one table. Joins use column values to match rows in one table to rows in another.
The join columns are usually referred to as foreign keys and primary keys.
Foreign and Primary Keys
Each table should have a Primary Key. This is one or more columns whose values, which cannot be NULL, are combined to provide a unique identifying value for each row. Natural primary keys are often difficult to find so many tables use an arbitrary integer whose value is automatically generated when the row is created. When joining tables we need to match a single row to one or more other rows, usually in another table - for example, matching a customer to her/his reservations. The single row (customer) is usually identified by its primary key value.
Foreign Keys are the columns in a table that reference corresponding columns in another table (although self-referencing foreign keys can reference the same table). For example, the res_id
column in the invoices table references the id
column in the reservations table (see diagram above).
The referenced column is almost always the primary key of the referenced table because a foreign key must always reference exactly one row in the referenced table (primary keys guarantee that).
Using JOIN in SQL
To join reservations and invoices in SQL:
SELECT r.cust_id, r.room_no, i.invoice_date, i.total
FROM reservations r JOIN
invoices i ON (r.id = i.res_id);
Notice:
- The new keyword JOIN with ON (predicate)
- Table aliases (
r
andi
) used to qualify columns
The new syntax follows the following pattern:
SELECT ...
FROM ... [JOIN ... ON (...)]...
[WHERE ...]
[GROUP BY ... [HAVING ...] ]
[ORDER BY ...]
Use the JOIN to define the combined row source then you can use WHERE, DISTINCT, GROUP BY, ORDER BY, etc… as with single-table queries. For example:
SELECT r.cust_id, r.room_no, i.invoice_date, i.total
FROM reservations r JOIN
invoices i ON (i.res_id = r.id)
WHERE r.checkin_date > '2018-07-01'
AND i.total < 500
ORDER BY i.invoice_date DESC, r.cust_id;
There is no theoretical limit to the number of tables that can be joined in a query, although practical considerations like complexity and performance must be considered. It is quite common, though, to find up to seven or eight tables joined in a query.
Multi-table joins just extend the syntax to add more tables, as below:
SELECT c.name, c.phone, c.email, i.invoice_date, i.total
FROM customers c JOIN
reservations r ON (r.cust_id = c.id) JOIN
invoices i ON (r.id = i.res_id)
WHERE i.invoice_date < current_date - interval '1 month'
AND i.paid = FALSE
ORDER BY i.invoice_date DESC, c.id;
Note You have just learned about what is called the INNER JOIN, which is the most common kind of join. Indeed, you can use the keyword INNER in the JOIN syntax, as follows:
SELECT c.name, c.phone, c.email, i.invoice_date, i.total
FROM customers c INNER JOIN
reservations r ON (r.cust_id = c.id) INNER JOIN
invoices i ON (r.id = i.res_id)
WHERE i.invoice_date < current_date - interval '1 month'
AND i.paid = FALSE
ORDER BY i.invoice_date DESC, c.id;
The INNER keyword is not required (it’s the default) but some organisations might require it for the sake of coding standards.
There are other kinds of JOIN, specifically the OUTER JOIN and the CROSS JOIN but these are less frequently used in applications. If you want to find out about these kinds of JOIN refer to the PostgreSQL documentation.
Exercise 5
- Try and understand each of the queries above in your
psql
prompt - Which customers occupied room 111 and what are their details?
- List the customer name, room details (room number, type and rate), nights stay and departure dates for all UK customers.
- List name, phone and email along with all reservations and invoices for customer Mary Saveley.
The Vexing Question of NULL
A column can be assigned a NULL value to indicate it has no value. This can happen when the data for this column is unknown at the time the row is created, for example, employee leaving date, order shipment date, etc… It can also be used when the data is optional.
Be careful with expressions - any expression that includes a NULL value results in NULL as the expression value.
Because NULL is ’no value’ it cannot be compared to anything else. For example, you will never get any results from:
SELECT * FROM customers WHERE postcode = NULL;
nor will you get any from:
SELECT * FROM customers WHERE postcode != NULL;
Instead you must use:
... WHERE postcode IS NULL
or
... WHERE postcode IS NOT NULL
This behaviour has some impacts on operations like JOIN, where NULL values won’t match. You could work around this, but see the warning below, by using:
... ON (a.col = b.col OR
a.col IS NULL AND b.col IS NULL)
WARNING: However, be aware that this is not a sensible situation - join columns containing NULL should be expected to not match or should be disallowed (see Primary Keys above)
You can explicitly provide NULL as a value in INSERT and UPDATE statements, for example:
INSERT INTO rooms (room_no, rate, room_type, no_guests)
VALUES (213, 95.00, NULL, 2);
UPDATE rooms SET room_type = NULL, no_guests = NULL
WHERE room_no = 204;
In INSERT statements if you omit a column from the column list (following the table name) then that column will be given either:
- an autogenerated value (if it has datatype SERIAL)
- a default value if one has been specified in the CREATE TABLE command
- NULL if neither of the above apply
Functions to Handle NULL
There are some functions that can operate on NULL values, especially the coalesce(x, y)
function. This function looks at the first argument x
and if it is NULL returns the value of the second argument y
otherwise it returns the value of x
. For example:
SELECT room_no, rate, coalesce(room_type, 'None') type
FROM rooms
WHERE no_guests IS NULL;
Notes:
- The coalesce function can take more than two arguments and returns the first of these (from left to right) that is not null.
- This feature is provided by most SQL vendors but goes by different names, e.g. ifnull(x, y) in MySQL, nvl(x, y) in Oracle, etc…
Exercise 6
- Which customers have not yet provided a phone number?
- Update room 304 such that it does not have a room_type.
- List customers (name and city) qand their reservations replacing the room number with ‘Not Assigned’ if it is NULL.
Mac-OS setup ⚙️
Learning Objectives
With the Apple Mac you can choose either a graphical installation or a command line method. The graphical method mostly involves dragging and dropping icons with a little bit of command line to configure the tools we need. The command line method uses only command line but is fairly simple.
Graphical Installation
Note that you must be logged in as a user with Administrator privileges on OSX for all these steps.
From the Macintosh OSX system you can install PostgreSQL from the Postgres.app package. Download this from https://postgresapp.com but make sure you choose PostgreSQL version 12 and the appropriate platform for your Mac. Once you have downloaded Postgres you can open the disk image from your Downloads folder. Drag the Postgres icon to the Applications folder.
Open Applications in the Finder and start Postgres. When you receive the alert “Postgres” is an application downloaded from the internet… then click on the Open button. You may be prompted for other options (e.g. auto update, …) and you can choose your own preference.
When the PostgreSQL window appears then click on the Initialise button to create your local PostgreSQL database and user - they will both have the same name as your OSX username.
PostgreSQL initialises itself, creating the initial databases (template1
, postgres
and <your user name>
) along with corresponding users. These should then be displayed in the PostgreSQL window, as below:
Configure the Command Line Interface (CLI)
To use the PostgreSQL command line tools you need to add the directory to your path (used to find commands in the terminal). There are several ways to do this but the simplest is to create a file containing the directory path name. Open the terminal and enter the appropriate commands from the list below (note: you may not need to create the directory):
# First check the destination directory exists:
ls /etc/paths.d/
# If the directory doesn't exist then create it:
sudo mkdir /etc/paths.d
# Create the path entry for Postgres:
echo /Applications/Postgres.app/Contents/Versions/latest/bin | sudo tee /etc/paths.d/postgresapp
From the terminal you can now type the simple command:
psql
and this should produce something like the following:
psql (12.4)
Type "help" for help.
keith=#
The final line is the psql command prompt, waiting for you to enter a command. It comprises your database name (created with the same name as your username, in the example this is ‘keith’) followed by ‘=#’.
You are going to rename the database to my_hotels
for the purposes of this course, so from the psql prompt (as above) first disconnect from your database by connecting to database postgres
:
keith=# \connect postgres
Now, still at the psql prompt, rename the database:
keith=# alter database keith rename to my_hotels;
Clearly, you’ll use your own name as the name of the database to be renamed.
Note that to exit psql back to the terminal prompt use the command \q
.
MacOS Graphical Install Complete
MacOS Command Line Installation
You need Homebrew installed on your Mac to use this approach. If you don’t yet have it then you should probably use the graphical method. If you already have Homebrew installed then the following command installs PostgreSQL:
brew install postgresql
This will install the software, set up the postgres user and a user with the same name as your Mac OS user and create some databases. It should leave the PostgreSQL server running on your Mac but if you power off the machine you’ll need to restart it. To restart the server, type:
brew services start postgresql
This starts the server and ensures that it will be started each time you power up your Mac. Note you can stop the server at any time by using:
brew services stop postgresql
Ubuntu setup ⚙️
Learning Objectives
Install
If you have Ubuntu as your operating system you can install PostgreSQL using the following commands in the terminal (note: the $
is the system prompt):
sudo apt-get install postgresql postgresql-contrib
Enter your ubuntu password when asked for it (it’s needed for the sudo
command).
For this command you must confirm that you are happy to install the packages and any needed dependencies that the package manager identifies. Generally just type Y
or y
to proceed.
To verify your installation enter the following commands:
sudo -u postgres psql --version
Again, enter your Ubuntu user password when prompted. The command should display something like this:
psql (PostgreSQL) 12.2 (Ubuntu 12.2.4)
Create Your User and Database
Create the database user with the same name as your Ubuntu username (this makes logging in a bit easier):
sudo -u postgres createuser -P --createdb <username>
This will ask for your Ubuntu login password to run the sudo command. The other parts of the line are:
-u postgres
tells sudo to run createuser as user postgres (set up when you installed postgres) -createuser
the shell command to create a PostgeSQL user-P
(upper case) tells createuser to ask for the new user’s password--createdb
grants the user the privileges to create databases<username>
the new username (hint: make it the same as your o/s username)
You’ll be asked for the new database user’s password (twice) - you’ll need this later for when we connect to the database from Node.
Remember the new username and password details - you will need them later.
Creating a new database
In a terminal, create a new database named my_hotels
with the following command:
createdb my_hotels
Then connect to your database with:
psql my_hotels
Note that if you chose to use your Ubuntu username for the database you don’t need it (or the password) to log in to psql
. If you used a different username then you’ll need to log in with:
psql -U <username> my_hotels
and you’ll be prompted for the password.
The output from this command should look something like this:
psql (12.2 (Ubuntu 12.2.4))
Type "help" for help.
my_hotels=>
The last line (my_hotels=>
) is the psql
command prompt, made up of your database name plus ‘=>’.
Note that to exit psql back to the terminal prompt use the command \q
.
Using Aggregate Functions
Learning Objectives
Basic Aggregate Functions
How to calculate totals, averages, etc. over multiple rows.
You frequently need to get a single piece of information that is derived from multiple rows in a table. For example, when you need to know the total of all invoices for August 2018:
SELECT sum(total)
FROM invoices
WHERE invoice_date BETWEEN
'2018-08-01' AND '2018-08-31';
The aggregate functions are:
- sum : Calculate the total of the values in a column
- avg : Calculate the average (mean) of the values in a column
- min : Determine the mimimum value of a column
- max : Determine the maximum value of a column
- count : Count the number of values (non-null) in a column
All the above are in the SQL standard, most implementations provide others. SUM and AVG can only apply to numeric data, the others can apply to any datatype.
Further examples:
“What is the average length of stay at our hotel?” :
SELECT avg(checkout_date - checkin_date)
FROM reservations;
“What are the lowest and highest room rates we charge?” :
SELECT min(rate) AS lowest,
max(rate) AS highest
FROM rooms;
You can use the count(x) function to count non-null values:
SELECT count(id) AS id_ct, count(postcode) AS post_ct
FROM customers;
id_ct | post_ct
-------+---------
133 | 126
(1 row)
Notice that these two results show different values - there are NULL values for postcode but id is mandatory for all rows.
If you just want to count the number of rows, use count(*)
. This is often used to find how many rows match a WHERE
clause:
SELECT count(*) FROM customers WHERE country = 'Belgium';
Hotel Exercise 1
- Get the numbers of rows in each of the tables: rooms, room_types, customers and reservations.
- How many reservations do we have for next month?
- How many invoices are still unpaid from over a month ago and what is the total owed?
- What is the maximum gap in days between a customer booking a room and the checkin date for that booking?
Grouping Rows for Aggregation
You can calculate aggregates over subsets of rows using the GROUP BY
clause:
SELECT count(*) FROM rooms
GROUP BY room_type;
count
-------
14
14
8
10
2
(5 rows)
What do you notice?
The query calculated the counts correctly but we have no idea which room type each value represents. To solve this we are allowed to include the GROUP BY
expressions in the list of selected values, as below:
SELECT room_type, count(*) FROM rooms
GROUP BY room_type;
room_type | count
--------------+-------
PREMIUM | 14
PREMIER | 14
PREMIER PLUS | 8
PREMIUM PLUS | 10
FAMILY | 2
(5 rows)
Notice the room_type
used for GROUP BY
is also included in the SELECT
list of values.
We can group by multiple expressions, for example:
SELECT trunc(room_no/100) AS floor,
to_char(checkin_date, 'YYYY-MM') AS month,
count(*), sum(no_guests), avg(no_guests)
FROM reservations
GROUP BY floor, month;
Notice that the GROUP BY
is using the column aliases floor
and month
that have been defined in the select list. This works in many, but not all, SQL implementations. (In those that don’t allow aliases you must use the full expression, for example: trunc(room_no/100)
instead of floor
)
You can use a WHERE
clause to restrict the rows that are included in the aggregate function. For example, if we need the above query for only the 2nd and 3rd floors:
SELECT trunc(room_no/100) AS floor,
to_char(checkin_date, 'YYYY-MM') AS month,
count(*), sum(no_guests), avg(no_guests)
FROM reservations
WHERE room_no BETWEEN 200 AND 399
GROUP BY floor, month;
Note that it is NOT usually possible to use column aliases in the where condition.
A WHERE
clause is applied before any aggregation, if you need to restrict results using an aggregate function you can’t do that using the WHERE
clause.
In the above, to return only results with the number of reservations greater than, say, 4 we use the HAVING
clause:
SELECT trunc(room_no/100) AS floor,
to_char(checkin_date, 'YYYY-MM') AS month,
count(*), sum(no_guests), avg(no_guests)
FROM reservations
GROUP BY floor, month
HAVING count(*) > 4; --<< Note the HAVING keyword
The order of clauses in the SELECT
statement is:
SELECT ...
FROM ...
[WHERE ...]
[GROUP BY ...
[HAVING ...] ]
[ORDER BY ...]
The square brackets indicate optional clauses. Note that HAVING
is only relevant when you have a GROUP BY
and must follow it in the SELECT
statement.
It can be confusing at first knowing whether to use a WHERE
clause or a HAVING
clause with GROUP BY
.
Use the
WHERE
clause when values you want to test are available without having to use any aggregate functions (e.g. plain column values).Use
HAVING
when the values you want to test are the results of aggregate functions (e.g.count(*)
,sum(amount)
,min(x)
, etc…).
Hotel Exercise 2
- What is the grand total of all invoices for each month?
- How many guests could be accommodated at one time on each floor?
- Which rooms have been occupied for less than 10 nights and for how many nights have they been occupied?