SQL basic study notes

Notes from my SQL study

I took the weekend to learn the basics of SQL and stumbled upon this amazing video from the freeCodeCamp youtube channel. I can’t stress enought how good the content of freeCodeCamp is, it helped me so many times! Anyway, here are my notes on SQL language.

SQL stands for Structured Query Language and it`s the default language to interact with Relational Database Management Systems (RDBMS).

A RDBMS is a software application to interact with the Relational Database. Examples of RDBMS are:

mySQL
Oracle
PostgresQL
mariaDB

We can use SQL for:

Perform CRUD operations
Perform admin tasks such as user management, security, backup
Define tables and structures
RDBMS can implement SQL in different ways

Tables and Keys

student_id	name	major
1	Jack	Biology
2	Kate	Math
3	Claire	Sociology

When defining a table, we want always to have a special column called primary key. It uniquely defines a row, each row has a different primary key. For example, the student_id 2 defines Kate.

When the primary key is a number that doesn’t mean anything in the real world, like an id 100, it’s called surrogate key. When the primary key is something that has a meaning in the real world, is a natural key. Like a primary key that is your CPF for example.

A foreign key is a column in which we can map to another table. The foreign key stores the primary key of another row in another database.

EMPLOYEE TABLE

`employee_id`	first_name	last_name	birth_date	sex	salary	branch_id	supervisor_id
100	Jack	Kim	1992-02-04	M	100,000	1	NULL
101	Kate	Han	1996-06-12	F	75,000	2	100
102	Claire	Song	1992-02-04	F	90,000	3	101

In the case above, the branch_id is a foreign_key for another table.

BRANCH TABLE

`branch_id`	branch_name	manager_id
2	Corporate	101
3	Ohio	102
1	Chicago	100

Now in this table, the manager_id is the foreign_key that maps the branch table back to the employee table

So we can define relationship between tables using these keys!

In the employee table, we can also see another foreign key called supervisor_id, which maps to the same employee table!

BRANCH SUPPLIER TABLE

`branch_id`	`supplier_name`	supply_type
2	Hammer Mill	Paper
2	Patriot Paper	Paper
3	Uni-ball	Writing Utensils

Note in this table we have two primary keys. They are called composite keys.

Composite keys are needed because neither branch_id nor supplier_name uniquely identify the row, only together they are unique! So we combine them to create the primary_key.

CLIENT TABLE

`client_id`	client_name	branch_id
400	Hammer Mill	2
401	Patriot Paper	2
402	Uni-ball	3

WORKS WITH TABLE

`employee_id`	`client_id`	total_sales
101	400	55,000
102	401	123,000
103	402	22,500

So client table defines a client for a specific branch, and the works with table defines how much a employee has sold for a given client. The special thing about the works with table is that the primary key is a composite key formed by two foreign keys. Got it? Both employee_id and client_id are foreign keys and together they create a primary key

SQL Basics

SQL has different purposes as we said in the beginning, but it`s also a mix of 4 languages:

Data Query language (DQL): use to make queries to the database, get information already stored
Data Definition Language (DDL): Used to defined schemas, tables, columns..
Data Control Language (DCL): control access to database, define users and permissions for each user and tables
Data Manipulation Language (DML): for CRUD operations

We will write a lot of Queries, a set of instructions written in SQL, given to a RDBMS, to find the information we want to retrieve.

SELECT employee.name, employee.age FROM employee WHERE employee.salary > 30000;

Creating Tables

We have to create and define the layout of the tables, also called schema of the database.

But before, a few os most common data types to work with in SQL:

INT: whole numbers
DECIMAL(M, N): decimal numbers, M is total number of digits, N is the number of decimal digits
VARCHAR(l): we can store a string of text. l is the length of the string, the maximum
BLOB: Binary Large Objects, to store large amounts of data. People use this for images and files.
DATE: ‘YYYY-MM-DD’
TIMESTAMP: ‘YYYY-MM-DD HH:MM:SS’, usually for when something has happened

There are more datatypes.

Now we will create the following table:

`student_id`	name	major
1	Jack	Biology
2	Kate	Math
3	Claire	Sociology

CREATE TABLE student (
  student_id INT PRIMARY KEY,
  name VARCHAR(20),
  major VARCHAR(20)
);

DESCRIBE student; to display the table data

DROP student; to delete the table

ALTER TABLE student ADD gpa DECIMAL(3, 2); add another column to the student table

ALTER TABLE student DROP COLUMN gpa; delete gpa column

Inserting Data

INSERT INTO student VALUES(1, 'Jack', 'Biology');

We want to add the values in the same order they were defined when created, so student_id, name and major.

SELECT * FROM student; to see all the values inside the table

INSERT INTO student(student_id, name) VALUES(3, 'Claire'); the major field is NULL because we don’t specify the major and neither the value.

Constraints

Let’s create the student table again, but in a better way to help us insert things.

CREATE TABLE student (
  student_id INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  major VARCHAR(20) UNIQUE DEFAULT 'undecided'
);

The primary_key is automatically not_null and unique.

We can also auto increment the student_id, so then we don`t need to specify them.

CREATE TABLE student (
  student_id INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(20),
  major VARCHAR(20)
);

Now we can create a student without student_id like: INSERT INTO student(name, major) VALUES('Jack', 'Biology');

Update and Delete Data

UPDATE student SET major = 'Bio' WHERE major = 'Biology';

UPDATE student SET major = 'Comp Sci' WHERE student_id = 4;

UPDATE student SET major = 'BioChemistry' WHERE major = 'Bio' OR major = 'Chemistry';

UPDATE student SET name = 'Tom', major = 'undecided' WHERE student_id = 1;

UPDATE student SET major = 'undecided' now every row will be affected because there is no WHERE here.

DELETE FROM student; deletes all the rows of the table

DELETE FROM student WHERE student_id = 5;

DELETE FROM student WHERE name = 'Tom' AND major = 'undecided';

Basic Queries

The special keyword here is SELECT, which means we are getting something from the db. After the SELECT keyword, we can specify the columns we want to select from the table.

SELECT name FROM student; SELECT name, major FROM student; or SELECT student.name, student.major FROM student;

SELECT name, major FROM student ORDER BY name; is alphabetical order based in the name SELECT name, major FROM student ORDER BY student_id DESC; is the descending order (reverse)

You can also order by columns.

SELECT * FROM student ORDER BY major, student_id; it will order first by major and if has same major, will order by student_id

We can also limit the amount of results we get

SELECT * FROM student LIMIT 2 SELECT * FROM student ORDER BY student_id DESC LIMIT 2

We can also filter

SELECT * FROM student WHERE major = 'Chemistry' SELECT * FROM student WHERE major = 'Chemistry' OR name = 'Kate'

The comparisson operators in SQL are:

>
<
<=
>=
=
<> not equal
AND
OR

SELECT * FROM student WHERE major <> 'Chemistry' SELECT * FROM student WHERE student_id <= 3 AND name <> 'Jack'

We can also compare a column to specific values

SELECT * FROM student WHERE name IN ('Claire', 'Kate', 'Mike')

Company Database Intro

Until now we only messed with one table. Let’s create and play with something more complex.

EMPLOYEE TABLE

`employee_id`	first_name	last_name	birth_date	sex	salary	*supervisor_id	*branch_id
100	David	Wallace	1967-11-17	M	250000	NULL	1
101	Jan	Levinson	1961-05-11	F	110000	100	1
102	Michael	Scott	1964-03-15	M	75000	100	2
103	Angela	Martin	1971-06-25	F	63000	102	2
104	Kelly	Kapoor	1980-02-05	F	55000	102	2
105	Stanley	Hudson	1958-02-19	M	69000	102	2
106	Josh	Porter	1969-09-05	M	78000	100	3
107	Andy	Bernard	1973-07-22	M	65000	106	3
108	Jim	Halpert	1978-10-01	M	71000	106	3

CREATE TABLE employee (
  employee_id INT PRIMARY KEY,
  first_name VARCHAR(40),
  last_name VARCHAR(40),
  birth_day DATE,
  sex VARCHAR(1),
  salary INT,
  supervisor_id INT,
  branch_id INT
);

ALTER TABLE employee
ADD FOREIGN KEY(branch_id)
REFERENCES branch(branch_id)
ON DELETE SET NULL;

ALTER TABLE employee
ADD FOREIGN KEY(supervisor_id)
REFERENCES employee(employee_id)
ON DELETE SET NULL;

We are adding the foreign_key in another query afterwards because we need to make sure the branch TABLE and employee TABLE exists so we can reference them as foreign_key

BRANCH TABLE

`branch_id`	branch_name	*manager_id	manager_start_date
1	Corporate	101	2006-02-09
2	Scranton	102	1992-04-06
3	Stamford	106	1998-02-13

CREATE TABLE branch (
  branch_id INT PRIMARY KEY,
  branch_name VARCHAR(40),
  manager_id INT,
  manager_start_date DATE,
  FOREIGN KEY(manager_id) REFERENCES employee(employee_id) ON DELETE SET NULL
);

CLIENT TABLE

`client_id`	client_name	*branch_id
400	Dunmore Highschool	2
401	Lackawana Country	2
402	FedEx	3
403	John Daly Law	3
404	Scranton Whitepages	2
405	Times Newspaper	3
406	FedEx	2

CREATE TABLE client (
  client_id INT PRIMARY KEY,
  client_name VARCHAR(40),
  branch_id INT,
  FOREIGN KEY(branch_id) REFERENCES branch(branch_id) ON DELETE SET NULL
);

WORKS WITH TABLE

`employee_id`	`client_id`	total_sales
102	401	267000
102	406	15000
105	400	55000
105	404	33000
105	406	130000
107	403	5000
107	405	26000
108	402	22500
108	403	12000

CREATE TABLE works_with (
  employee_id INT,
  client_id INT,
  total_sales INT,
  PRIMARY KEY(employee_id, client_id),
  FOREIGN KEY(employee_id) REFERENCES employee(employee_id) ON DELETE CASCADE,
  FOREIGN KEY(client_id) REFERENCES client(client_id) ON DELETE CASCADE
);

BRANCH SUPPLIER TABLE

`branch_id`	`supplier_name`	supply_type
2	Hammer Mill	Paper
2	J.T. Forms & Labels	Custom Forms
2	Uni-ball	Writing Utensils
3	Hammer Mill	Paper
3	Patriot Paper	Paper
3	Stamford Lables	Custom Forms
3	Uni-ball	Writing Utensils

CREATE TABLE branch_supplier (
  branch_id INT,
  supplier_name VARCHAR(40),
  supply_type VARCHAR(40),
  PRIMARY KEY(branch_id, supplier_name),
  FOREIGN KEY(branch_id) REFERENCES branch(branch_id) ON DELETE CASCADE
);

Now to insert data into these tables, it also has to be done in a certain way, because of the relations between tables using foreign_key, one may not exist yet. So we have to create them in a certain order.

-- Corporate
INSERT INTO employee VALUES(100, 'David', 'Wallace', '1967-11-17', 'M', 250000, NULL, NULL);

INSERT INTO branch VALUES(1, 'Corporate', 100, '2006-02-09');

UPDATE employee
SET branch_id = 1
WHERE emp_id = 100;

INSERT INTO employee VALUES(101, 'Jan', 'Levinson', '1961-05-11', 'F', 110000, 100, 1);

-- Scranton
INSERT INTO employee VALUES(102, 'Michael', 'Scott', '1964-03-15', 'M', 75000, 100, NULL);

INSERT INTO branch VALUES(2, 'Scranton', 102, '1992-04-06');

UPDATE employee
SET branch_id = 2
WHERE emp_id = 102;

INSERT INTO employee VALUES(103, 'Angela', 'Martin', '1971-06-25', 'F', 63000, 102, 2);
INSERT INTO employee VALUES(104, 'Kelly', 'Kapoor', '1980-02-05', 'F', 55000, 102, 2);
INSERT INTO employee VALUES(105, 'Stanley', 'Hudson', '1958-02-19', 'M', 69000, 102, 2);

-- Stamford
INSERT INTO employee VALUES(106, 'Josh', 'Porter', '1969-09-05', 'M', 78000, 100, NULL);

INSERT INTO branch VALUES(3, 'Stamford', 106, '1998-02-13');

UPDATE employee
SET branch_id = 3
WHERE emp_id = 106;

INSERT INTO employee VALUES(107, 'Andy', 'Bernard', '1973-07-22', 'M', 65000, 106, 3);
INSERT INTO employee VALUES(108, 'Jim', 'Halpert', '1978-10-01', 'M', 71000, 106, 3);


-- BRANCH SUPPLIER
INSERT INTO branch_supplier VALUES(2, 'Hammer Mill', 'Paper');
INSERT INTO branch_supplier VALUES(2, 'Uni-ball', 'Writing Utensils');
INSERT INTO branch_supplier VALUES(3, 'Patriot Paper', 'Paper');
INSERT INTO branch_supplier VALUES(2, 'J.T. Forms & Labels', 'Custom Forms');
INSERT INTO branch_supplier VALUES(3, 'Uni-ball', 'Writing Utensils');
INSERT INTO branch_supplier VALUES(3, 'Hammer Mill', 'Paper');
INSERT INTO branch_supplier VALUES(3, 'Stamford Lables', 'Custom Forms');

-- CLIENT
INSERT INTO client VALUES(400, 'Dunmore Highschool', 2);
INSERT INTO client VALUES(401, 'Lackawana Country', 2);
INSERT INTO client VALUES(402, 'FedEx', 3);
INSERT INTO client VALUES(403, 'John Daly Law, LLC', 3);
INSERT INTO client VALUES(404, 'Scranton Whitepages', 2);
INSERT INTO client VALUES(405, 'Times Newspaper', 3);
INSERT INTO client VALUES(406, 'FedEx', 2);

-- WORKS_WITH
INSERT INTO works_with VALUES(105, 400, 55000);
INSERT INTO works_with VALUES(102, 401, 267000);
INSERT INTO works_with VALUES(108, 402, 22500);
INSERT INTO works_with VALUES(107, 403, 5000);
INSERT INTO works_with VALUES(108, 403, 12000);
INSERT INTO works_with VALUES(105, 404, 33000);
INSERT INTO works_with VALUES(107, 405, 26000);
INSERT INTO works_with VALUES(102, 406, 15000);
INSERT INTO works_with VALUES(105, 406, 130000);

More Basic Queries

Find all employees ordered by salary

SELECT * FROM employee ORDER BY salary DESC;

Find all employees ordered by sex then name

SELECT * FROM employee ORDER BY sex, first_name, last_name;

Find the first 5 employees in the table

SELECT * FROM employee LIMIT 5;

Find the first and last names of all employees

SELECT first_name, last_name FROM employee;

Find the forename and surname of all employees

SELECT first_name AS forename, last_name AS surname FROM employee;

Find out all the different genders

SELECT DISTINCT sex FROM employee;

Functions

Find the number of employees

SELECT COUNT(employee_id) FROM employee;

Find the number of female employees bron after 1970

SELECT COUNT(employee_id) FROM employee WHERE sex = 'F' AND birth_date > '1971-01-01';

Find the average of all employee’s salaries

SELECT AVG(salary) FROM employee;

FInd the sum of all employee salaries

SELECT SUM(salary) FROM employee;

Now we will do AGREGATIONS, which are a way to use functions to display the data in a better way.

Find out how many males and females there are

SELECT COUNT(sex), sex FROM employee GROUP BY sex;

Find the total sales of each salesman

SELECT SUM(total_sales), emp_id FROM works_with GROUP BY emp_id;

Wildcards

It’s a way to grab data that matches a specific pattern. We can define a pattern, pretty much like regex, in combination with the LIKE keyword.

%: any number of characters
_: one character

Find any clients who are an LLC (they have LLC in their name)

SELECT * FROM client WHERE client_name LIKE '%LLC';

The statement above means: select any client with client_name that has any number of characters behind and then has LLC (ends with ‘LLC’)

Find any branch suppliers who are in the label business

SELECT * FROM branch_supplier WHERE supplier_name LIKE '% Label%';

Find any employee born in October

SELECT * FROM employee WHERE birth_day LIKE '%-10-%';

SELECT * FROM employee WHERE birth_day LIKE '____-10%';

The statement above means: there are any 4 characters, then a -, then a “10”, then any number of characters

Find any clients who are schools

SELECT * FROM client WHERE client_name LIKE '%school%';

Union

A Union is a special SQL operator to combine multiple SELECT statements into one.

Find a list of employee and branch names

SELECT first_name FROM employee UNION SELECT branch_name FROM branch;

A thing about UNION: we have to grab the same number of columns from each statement. They also have to have the same datatype.

The results will come in a single columns. We can modify the result’s name using AS keyword

SELECT first_name AS Company_Names FROM employee UNION SELECT branch_name FROM branch UNION SELECT client_name FROM client;

Find a list of all clients & branch suppliers names, and the associate branches

SELECT client_name, branch_id FROM client UNION SELECT supplier_name, branch_id FROM branch_supplier;

Find a list of all money spent or earned by the company

SELECT salary FROM employee UNION SELECT total_sales FROM works_with;

Joins

Joins are used to combine information from different tables, based on a row they have in common.

To work with JOINS, let’s insert another branch into our example

INSERT INTO branch VALUES(4, 'Buffalo', NULL, NULL);

Now let’s get started. There are 4 kinds of JOIN.

Find all branches and the names of their managers

SELECT employee.employee_id, employee.first_name, branch.branch_name
FROM employee
JOIN branch
ON employee.employee_id = branch.manager_id

This above is a inner JOIN, when you are joining 2 tables that share one column in common and share the values of a row. So only the employee.employee_id = branch.manager_id will be returned.

SELECT employee.employee_id, employee.first_name, branch.branch_name
FROM employee
LEFT JOIN branch
ON employee.employee_id = branch.manager_id

The left JOIN will return all the values for the left table, the first table, in this case the employee table.

SELECT employee.employee_id, employee.first_name, branch.branch_name
FROM employee
RIGHT JOIN branch
ON employee.employee_id = branch.manager_id

The right JOIN will return all the values for the right table, the second table, in this case the branch table.

There is a 4th type of JOIN called full outer JOIN, but is not available in mySQL. It is a combination of LEFT and RIGHT JOIN, will return every value from the tables.

Nested Queries

We basically use lots of SELECT statements in order to get a specific information. We use the results of one SELECT to get the result of another SELECT, like a pipe thing.

Find names of all employees who have sold over 30,000 to a single client.

SELECT employee.first_name, employee.last_name
FROM employee
WHERE employee.employee_id IN (
	SELECT works_with.employee_id
	FROM works_with
	WHERE works_with.total_sales > 30000
);

In the query above we are querying all the works_with.employee_id from the works_with table, who have more than 30000 in total sales. From this result, we are getting the employee.first_name and employee.last_name when the employee.employee_id matches the result inside the IN query.

Find all clients who are handled by the branch that Michael Scott manages (assume you know Michael’s ID)

SELECT client.client_name
FROM client
WHERE client.branch_id IN (
	SELECT branch.branch_id
  FROM branch
  WHERE branch.mgr_id = 102
);

The SQL will always execute the inner query first, then executes the outer query.

On Delete

Imagine a situation: if we delete an employee from the employee table, like Micahel Scott with ID 102, what happens to the branch he used to manage? Like, Scranton branch will be without a manager_id? Or it will be NULL?

We can do basically two things:

On Delete set NULL: if we delete the employee, the manager id will be NULL
On Delete Cascade: if we delete the employee, the whole row that depends on the deleted user will also be deleted.

CREATE TABLE branch (
  branch_id INT PRIMARY KEY,
  branch_name VARCHAR(40),
  manager_id INT,
  manager_start_date DATE,
  FOREIGN KEY(manager_id) REFERENCES employee(employee_id) ON DELETE SET NULL
);

In the statement above, we are saying: if we delete the employee(employee_id), set the manager_id to NULL

CREATE TABLE branch_supplier (
  branch_id INT,
  supplier_name VARCHAR(40),
  supply_type VARCHAR(40),
  PRIMARY KEY(branch_id, supplier_name),
  FOREIGN KEY(branch_id) REFERENCES branch(branch_id) ON DELETE CASCADE
);

Now above, if the branch(branch_id) gets deleted, delete the entire row referencing that branch_id

Triggers

Is a block of SQL code that defines a certain action when some operation happens in the database.

eg: any time a row gets deleted in this tables, do something.

Let’s create a new table to work with triggers

CREATE TABLE trigger_test (
	message VARCHAR(100)
);

Now let’s write a trigger

DELIMITER $$
CREATE
  TRIGGER my_trigger BEFORE INSERT
  ON employee
  FOR EACH ROW BEGIN
    INSERT INTO trigger_test VALUES('added new employee');
  END$$
DELIMITER ;

Trigger is what’s inside the DELIMITER.

We defined a trigegr by calling CREATE TRIGGER and then a name for it. Then we say that before inserting a new row to employee table, insert the value ‘added new employee’ to the trigger_test table.

This means we can automate actions whenever something happens in our database.

The DELIMITER keyword means we are changing the default delimiter of the SQL. The default delimiter is a ;, since we end up each statement with a ; in SQL. But in this case, we are already using ; inside the trigger, so we must change the delimiter for this block of code. So we changed the delimiter to $$ and then in the end, we changed the delimiter again to the default ;.

Inside mySQLBenchmark we have a Trigger tab for each table, so we can specify the trigger inside it. Using a regular command line, we have to open the SQL CLI and first define the DELIMITER $$, then the trigegr code and define the DELIMITER ; again.

DELIMITER $$
CREATE
  TRIGGER my_trigger BEFORE INSERT
  ON employee
  FOR EACH ROW BEGIN
    INSERT INTO trigger_test VALUES(NEW.first_name);
  END$$
DELIMITER ;

Now we are inserting the new employee first_name everytime a new employee is inserted into the employee table. We’re using the NEW keyword here.

DELIMITER $$
CREATE
  TRIGGER my_trigger BEFORE INSERT
  ON employee
  FOR EACH ROW BEGIN
    IF NEW.sex = 'M' THEN
      INSERT INTO trigger_test VALUES('added male employee');
    ELSEIF NEW.sex = 'F' THEN
      INSERT INTO trigger_test VALUES('added female');
    ELSE
      INSERT INTO trigger_test VALUES('added other employee');
    END IF;
  END$$
DELIMITER ;

This is more complex because it has IF ELSE statements.

In the mySQL CLI, we can drop a trigger like: DROP TRIGGER {{name of trigger}}

Entity Relationship Diagrams (ER Diagrams)

Using ER Diagrams is a good way to map and plan the database schema. We take the requirements and relationships between tables and turn into a diagram. So later we can create our schemas in a better way.

Entity: an object we want to model and store information about. eg: Student
Attributes: specific pieces of information about and entity. eg: gpa
Primary Key: attribute that uniquely identify an entry. eg: student_id
Composite attributes: attribute that can be broken further into sub-attributes. eg: first_name and last_name from name
Multi value attributes: attribute that canhave more than one value. eg: clubs
Derived attribute: attribute that can be derived from other attributes. eg: has_honors, come from the gpa attribute.
Relationships: define relation between two entities. Is a losangle in the diagram. eg: student and classes
Single participation: it’s a single line meaning not all entities need to relate to the other entity. eg: not all student need to be in a class
Total participation: it’s a double line meaning all entities should relate to the other entity. eg: all classes need to have students
Relationship cardinality: the number of instances a entity relates with another entity. eg: student can take multiple classes and a class is taken by any number of students. So it’s a N:M cadinality. It also exists the 1:1 and 1:N cardinality.
Weak entity: cannot be uniquely identified by it’s own attributes, meaning it depends on another entity to exist. eg: Exam depends on a Class to exist.