SQL basic study notes

Notes from my SQL study

I took the weekend to learn the basics of SQL and stumbled upon this amazing video from the freeCodeCamp youtube channel. I can’t stress enought how good the content of freeCodeCamp is, it helped me so many times! Anyway, here are my notes on SQL language.

SQL stands for Structured Query Language and it`s the default language to interact with Relational Database Management Systems (RDBMS).

A RDBMS is a software application to interact with the Relational Database. Examples of RDBMS are:

We can use SQL for:

Tables and Keys

student_idnamemajor
1JackBiology
2KateMath
3ClaireSociology

When defining a table, we want always to have a special column called primary key. It uniquely defines a row, each row has a different primary key. For example, the student_id 2 defines Kate.

When the primary key is a number that doesn’t mean anything in the real world, like an id 100, it’s called surrogate key. When the primary key is something that has a meaning in the real world, is a natural key. Like a primary key that is your CPF for example.

A foreign key is a column in which we can map to another table. The foreign key stores the primary key of another row in another database.

EMPLOYEE TABLE

employee_idfirst_namelast_namebirth_datesexsalarybranch_idsupervisor_id
100JackKim1992-02-04M100,0001NULL
101KateHan1996-06-12F75,0002100
102ClaireSong1992-02-04F90,0003101

In the case above, the branch_id is a foreign_key for another table.

BRANCH TABLE

branch_idbranch_namemanager_id
2Corporate101
3Ohio102
1Chicago100

Now in this table, the manager_id is the foreign_key that maps the branch table back to the employee table

So we can define relationship between tables using these keys!

In the employee table, we can also see another foreign key called supervisor_id, which maps to the same employee table!

BRANCH SUPPLIER TABLE

branch_idsupplier_namesupply_type
2Hammer MillPaper
2Patriot PaperPaper
3Uni-ballWriting Utensils

Note in this table we have two primary keys. They are called composite keys.

Composite keys are needed because neither branch_id nor supplier_name uniquely identify the row, only together they are unique! So we combine them to create the primary_key.

CLIENT TABLE

client_idclient_namebranch_id
400Hammer Mill2
401Patriot Paper2
402Uni-ball3

WORKS WITH TABLE

employee_idclient_idtotal_sales
10140055,000
102401123,000
10340222,500

So client table defines a client for a specific branch, and the works with table defines how much a employee has sold for a given client. The special thing about the works with table is that the primary key is a composite key formed by two foreign keys. Got it? Both employee_id and client_id are foreign keys and together they create a primary key

SQL Basics

SQL has different purposes as we said in the beginning, but it`s also a mix of 4 languages:

We will write a lot of Queries, a set of instructions written in SQL, given to a RDBMS, to find the information we want to retrieve.

SELECT employee.name, employee.age FROM employee WHERE employee.salary > 30000;

Creating Tables

We have to create and define the layout of the tables, also called schema of the database.

But before, a few os most common data types to work with in SQL:

There are more datatypes.

Now we will create the following table:

student_idnamemajor
1JackBiology
2KateMath
3ClaireSociology
CREATE TABLE student (
  student_id INT PRIMARY KEY,
  name VARCHAR(20),
  major VARCHAR(20)
);

DESCRIBE student; to display the table data

DROP student; to delete the table

ALTER TABLE student ADD gpa DECIMAL(3, 2); add another column to the student table

ALTER TABLE student DROP COLUMN gpa; delete gpa column

Inserting Data

INSERT INTO student VALUES(1, 'Jack', 'Biology');

We want to add the values in the same order they were defined when created, so student_id, name and major.

SELECT * FROM student; to see all the values inside the table

INSERT INTO student(student_id, name) VALUES(3, 'Claire'); the major field is NULL because we don’t specify the major and neither the value.

Constraints

Let’s create the student table again, but in a better way to help us insert things.

CREATE TABLE student (
  student_id INT PRIMARY KEY,
  name VARCHAR(20) NOT NULL,
  major VARCHAR(20) UNIQUE DEFAULT 'undecided'
);

The primary_key is automatically not_null and unique.

We can also auto increment the student_id, so then we don`t need to specify them.

CREATE TABLE student (
  student_id INT PRIMARY KEY AUTO_INCREMENT,
  name VARCHAR(20),
  major VARCHAR(20)
);

Now we can create a student without student_id like: INSERT INTO student(name, major) VALUES('Jack', 'Biology');

Update and Delete Data

UPDATE student SET major = 'Bio' WHERE major = 'Biology';

UPDATE student SET major = 'Comp Sci' WHERE student_id = 4;

UPDATE student SET major = 'BioChemistry' WHERE major = 'Bio' OR major = 'Chemistry';

UPDATE student SET name = 'Tom', major = 'undecided' WHERE student_id = 1;

UPDATE student SET major = 'undecided' now every row will be affected because there is no WHERE here.

DELETE FROM student; deletes all the rows of the table

DELETE FROM student WHERE student_id = 5;

DELETE FROM student WHERE name = 'Tom' AND major = 'undecided';

Basic Queries

The special keyword here is SELECT, which means we are getting something from the db. After the SELECT keyword, we can specify the columns we want to select from the table.

SELECT name FROM student; SELECT name, major FROM student; or SELECT student.name, student.major FROM student;

SELECT name, major FROM student ORDER BY name; is alphabetical order based in the name SELECT name, major FROM student ORDER BY student_id DESC; is the descending order (reverse)

You can also order by columns.

SELECT * FROM student ORDER BY major, student_id; it will order first by major and if has same major, will order by student_id

We can also limit the amount of results we get

SELECT * FROM student LIMIT 2 SELECT * FROM student ORDER BY student_id DESC LIMIT 2

We can also filter

SELECT * FROM student WHERE major = 'Chemistry' SELECT * FROM student WHERE major = 'Chemistry' OR name = 'Kate'

The comparisson operators in SQL are:

SELECT * FROM student WHERE major <> 'Chemistry' SELECT * FROM student WHERE student_id <= 3 AND name <> 'Jack'

We can also compare a column to specific values

SELECT * FROM student WHERE name IN ('Claire', 'Kate', 'Mike')

Company Database Intro

Until now we only messed with one table. Let’s create and play with something more complex.

EMPLOYEE TABLE

employee_idfirst_namelast_namebirth_datesexsalary*supervisor_id*branch_id
100DavidWallace1967-11-17M250000NULL1
101JanLevinson1961-05-11F1100001001
102MichaelScott1964-03-15M750001002
103AngelaMartin1971-06-25F630001022
104KellyKapoor1980-02-05F550001022
105StanleyHudson1958-02-19M690001022
106JoshPorter1969-09-05M780001003
107AndyBernard1973-07-22M650001063
108JimHalpert1978-10-01M710001063
CREATE TABLE employee (
  employee_id INT PRIMARY KEY,
  first_name VARCHAR(40),
  last_name VARCHAR(40),
  birth_day DATE,
  sex VARCHAR(1),
  salary INT,
  supervisor_id INT,
  branch_id INT
);

ALTER TABLE employee
ADD FOREIGN KEY(branch_id)
REFERENCES branch(branch_id)
ON DELETE SET NULL;

ALTER TABLE employee
ADD FOREIGN KEY(supervisor_id)
REFERENCES employee(employee_id)
ON DELETE SET NULL;

We are adding the foreign_key in another query afterwards because we need to make sure the branch TABLE and employee TABLE exists so we can reference them as foreign_key

BRANCH TABLE

branch_idbranch_name*manager_idmanager_start_date
1Corporate1012006-02-09
2Scranton1021992-04-06
3Stamford1061998-02-13
CREATE TABLE branch (
  branch_id INT PRIMARY KEY,
  branch_name VARCHAR(40),
  manager_id INT,
  manager_start_date DATE,
  FOREIGN KEY(manager_id) REFERENCES employee(employee_id) ON DELETE SET NULL
);

CLIENT TABLE

client_idclient_name*branch_id
400Dunmore Highschool2
401Lackawana Country2
402FedEx3
403John Daly Law3
404Scranton Whitepages2
405Times Newspaper3
406FedEx2
CREATE TABLE client (
  client_id INT PRIMARY KEY,
  client_name VARCHAR(40),
  branch_id INT,
  FOREIGN KEY(branch_id) REFERENCES branch(branch_id) ON DELETE SET NULL
);

WORKS WITH TABLE

employee_idclient_idtotal_sales
102401267000
10240615000
10540055000
10540433000
105406130000
1074035000
10740526000
10840222500
10840312000
CREATE TABLE works_with (
  employee_id INT,
  client_id INT,
  total_sales INT,
  PRIMARY KEY(employee_id, client_id),
  FOREIGN KEY(employee_id) REFERENCES employee(employee_id) ON DELETE CASCADE,
  FOREIGN KEY(client_id) REFERENCES client(client_id) ON DELETE CASCADE
);

BRANCH SUPPLIER TABLE

branch_idsupplier_namesupply_type
2Hammer MillPaper
2J.T. Forms & LabelsCustom Forms
2Uni-ballWriting Utensils
3Hammer MillPaper
3Patriot PaperPaper
3Stamford LablesCustom Forms
3Uni-ballWriting Utensils
CREATE TABLE branch_supplier (
  branch_id INT,
  supplier_name VARCHAR(40),
  supply_type VARCHAR(40),
  PRIMARY KEY(branch_id, supplier_name),
  FOREIGN KEY(branch_id) REFERENCES branch(branch_id) ON DELETE CASCADE
);

Now to insert data into these tables, it also has to be done in a certain way, because of the relations between tables using foreign_key, one may not exist yet. So we have to create them in a certain order.

-- Corporate
INSERT INTO employee VALUES(100, 'David', 'Wallace', '1967-11-17', 'M', 250000, NULL, NULL);

INSERT INTO branch VALUES(1, 'Corporate', 100, '2006-02-09');

UPDATE employee
SET branch_id = 1
WHERE emp_id = 100;

INSERT INTO employee VALUES(101, 'Jan', 'Levinson', '1961-05-11', 'F', 110000, 100, 1);

-- Scranton
INSERT INTO employee VALUES(102, 'Michael', 'Scott', '1964-03-15', 'M', 75000, 100, NULL);

INSERT INTO branch VALUES(2, 'Scranton', 102, '1992-04-06');

UPDATE employee
SET branch_id = 2
WHERE emp_id = 102;

INSERT INTO employee VALUES(103, 'Angela', 'Martin', '1971-06-25', 'F', 63000, 102, 2);
INSERT INTO employee VALUES(104, 'Kelly', 'Kapoor', '1980-02-05', 'F', 55000, 102, 2);
INSERT INTO employee VALUES(105, 'Stanley', 'Hudson', '1958-02-19', 'M', 69000, 102, 2);

-- Stamford
INSERT INTO employee VALUES(106, 'Josh', 'Porter', '1969-09-05', 'M', 78000, 100, NULL);

INSERT INTO branch VALUES(3, 'Stamford', 106, '1998-02-13');

UPDATE employee
SET branch_id = 3
WHERE emp_id = 106;

INSERT INTO employee VALUES(107, 'Andy', 'Bernard', '1973-07-22', 'M', 65000, 106, 3);
INSERT INTO employee VALUES(108, 'Jim', 'Halpert', '1978-10-01', 'M', 71000, 106, 3);


-- BRANCH SUPPLIER
INSERT INTO branch_supplier VALUES(2, 'Hammer Mill', 'Paper');
INSERT INTO branch_supplier VALUES(2, 'Uni-ball', 'Writing Utensils');
INSERT INTO branch_supplier VALUES(3, 'Patriot Paper', 'Paper');
INSERT INTO branch_supplier VALUES(2, 'J.T. Forms & Labels', 'Custom Forms');
INSERT INTO branch_supplier VALUES(3, 'Uni-ball', 'Writing Utensils');
INSERT INTO branch_supplier VALUES(3, 'Hammer Mill', 'Paper');
INSERT INTO branch_supplier VALUES(3, 'Stamford Lables', 'Custom Forms');

-- CLIENT
INSERT INTO client VALUES(400, 'Dunmore Highschool', 2);
INSERT INTO client VALUES(401, 'Lackawana Country', 2);
INSERT INTO client VALUES(402, 'FedEx', 3);
INSERT INTO client VALUES(403, 'John Daly Law, LLC', 3);
INSERT INTO client VALUES(404, 'Scranton Whitepages', 2);
INSERT INTO client VALUES(405, 'Times Newspaper', 3);
INSERT INTO client VALUES(406, 'FedEx', 2);

-- WORKS_WITH
INSERT INTO works_with VALUES(105, 400, 55000);
INSERT INTO works_with VALUES(102, 401, 267000);
INSERT INTO works_with VALUES(108, 402, 22500);
INSERT INTO works_with VALUES(107, 403, 5000);
INSERT INTO works_with VALUES(108, 403, 12000);
INSERT INTO works_with VALUES(105, 404, 33000);
INSERT INTO works_with VALUES(107, 405, 26000);
INSERT INTO works_with VALUES(102, 406, 15000);
INSERT INTO works_with VALUES(105, 406, 130000);

More Basic Queries

Find all employees ordered by salary

SELECT * FROM employee ORDER BY salary DESC;

Find all employees ordered by sex then name

SELECT * FROM employee ORDER BY sex, first_name, last_name;

Find the first 5 employees in the table

SELECT * FROM employee LIMIT 5;

Find the first and last names of all employees

SELECT first_name, last_name FROM employee;

Find the forename and surname of all employees

SELECT first_name AS forename, last_name AS surname FROM employee;

Find out all the different genders

SELECT DISTINCT sex FROM employee;

Functions

Find the number of employees

SELECT COUNT(employee_id) FROM employee;

Find the number of female employees bron after 1970

SELECT COUNT(employee_id) FROM employee WHERE sex = 'F' AND birth_date > '1971-01-01';

Find the average of all employee’s salaries

SELECT AVG(salary) FROM employee;

FInd the sum of all employee salaries

SELECT SUM(salary) FROM employee;

Now we will do AGREGATIONS, which are a way to use functions to display the data in a better way.

Find out how many males and females there are

SELECT COUNT(sex), sex FROM employee GROUP BY sex;

Find the total sales of each salesman

SELECT SUM(total_sales), emp_id FROM works_with GROUP BY emp_id;

Wildcards

It’s a way to grab data that matches a specific pattern. We can define a pattern, pretty much like regex, in combination with the LIKE keyword.

Find any clients who are an LLC (they have LLC in their name)

SELECT * FROM client WHERE client_name LIKE '%LLC';

The statement above means: select any client with client_name that has any number of characters behind and then has LLC (ends with ‘LLC’)

Find any branch suppliers who are in the label business

SELECT * FROM branch_supplier WHERE supplier_name LIKE '% Label%';

Find any employee born in October

SELECT * FROM employee WHERE birth_day LIKE '%-10-%';

SELECT * FROM employee WHERE birth_day LIKE '____-10%';

The statement above means: there are any 4 characters, then a -, then a “10”, then any number of characters

Find any clients who are schools

SELECT * FROM client WHERE client_name LIKE '%school%';

Union

A Union is a special SQL operator to combine multiple SELECT statements into one.

Find a list of employee and branch names

SELECT first_name FROM employee UNION SELECT branch_name FROM branch;

A thing about UNION: we have to grab the same number of columns from each statement. They also have to have the same datatype.

The results will come in a single columns. We can modify the result’s name using AS keyword

SELECT first_name AS Company_Names FROM employee UNION SELECT branch_name FROM branch UNION SELECT client_name FROM client;

Find a list of all clients & branch suppliers names, and the associate branches

SELECT client_name, branch_id FROM client UNION SELECT supplier_name, branch_id FROM branch_supplier;

Find a list of all money spent or earned by the company

SELECT salary FROM employee UNION SELECT total_sales FROM works_with;

Joins

Joins are used to combine information from different tables, based on a row they have in common.

To work with JOINS, let’s insert another branch into our example

INSERT INTO branch VALUES(4, 'Buffalo', NULL, NULL);

Now let’s get started. There are 4 kinds of JOIN.

Find all branches and the names of their managers

SELECT employee.employee_id, employee.first_name, branch.branch_name
FROM employee
JOIN branch
ON employee.employee_id = branch.manager_id

This above is a inner JOIN, when you are joining 2 tables that share one column in common and share the values of a row. So only the employee.employee_id = branch.manager_id will be returned.

SELECT employee.employee_id, employee.first_name, branch.branch_name
FROM employee
LEFT JOIN branch
ON employee.employee_id = branch.manager_id

The left JOIN will return all the values for the left table, the first table, in this case the employee table.

SELECT employee.employee_id, employee.first_name, branch.branch_name
FROM employee
RIGHT JOIN branch
ON employee.employee_id = branch.manager_id

The right JOIN will return all the values for the right table, the second table, in this case the branch table.

There is a 4th type of JOIN called full outer JOIN, but is not available in mySQL. It is a combination of LEFT and RIGHT JOIN, will return every value from the tables.

Nested Queries

We basically use lots of SELECT statements in order to get a specific information. We use the results of one SELECT to get the result of another SELECT, like a pipe thing.

Find names of all employees who have sold over 30,000 to a single client.

SELECT employee.first_name, employee.last_name
FROM employee
WHERE employee.employee_id IN (
	SELECT works_with.employee_id
	FROM works_with
	WHERE works_with.total_sales > 30000
);

In the query above we are querying all the works_with.employee_id from the works_with table, who have more than 30000 in total sales. From this result, we are getting the employee.first_name and employee.last_name when the employee.employee_id matches the result inside the IN query.

Find all clients who are handled by the branch that Michael Scott manages (assume you know Michael’s ID)

SELECT client.client_name
FROM client
WHERE client.branch_id IN (
	SELECT branch.branch_id
  FROM branch
  WHERE branch.mgr_id = 102
);

The SQL will always execute the inner query first, then executes the outer query.

On Delete

Imagine a situation: if we delete an employee from the employee table, like Micahel Scott with ID 102, what happens to the branch he used to manage? Like, Scranton branch will be without a manager_id? Or it will be NULL?

We can do basically two things:

CREATE TABLE branch (
  branch_id INT PRIMARY KEY,
  branch_name VARCHAR(40),
  manager_id INT,
  manager_start_date DATE,
  FOREIGN KEY(manager_id) REFERENCES employee(employee_id) ON DELETE SET NULL
);

In the statement above, we are saying: if we delete the employee(employee_id), set the manager_id to NULL

CREATE TABLE branch_supplier (
  branch_id INT,
  supplier_name VARCHAR(40),
  supply_type VARCHAR(40),
  PRIMARY KEY(branch_id, supplier_name),
  FOREIGN KEY(branch_id) REFERENCES branch(branch_id) ON DELETE CASCADE
);

Now above, if the branch(branch_id) gets deleted, delete the entire row referencing that branch_id

Triggers

Is a block of SQL code that defines a certain action when some operation happens in the database.

eg: any time a row gets deleted in this tables, do something.

Let’s create a new table to work with triggers

CREATE TABLE trigger_test (
	message VARCHAR(100)
);

Now let’s write a trigger

DELIMITER $$
CREATE
  TRIGGER my_trigger BEFORE INSERT
  ON employee
  FOR EACH ROW BEGIN
    INSERT INTO trigger_test VALUES('added new employee');
  END$$
DELIMITER ;

Trigger is what’s inside the DELIMITER.

We defined a trigegr by calling CREATE TRIGGER and then a name for it. Then we say that before inserting a new row to employee table, insert the value ‘added new employee’ to the trigger_test table.

This means we can automate actions whenever something happens in our database.

The DELIMITER keyword means we are changing the default delimiter of the SQL. The default delimiter is a ;, since we end up each statement with a ; in SQL. But in this case, we are already using ; inside the trigger, so we must change the delimiter for this block of code. So we changed the delimiter to $$ and then in the end, we changed the delimiter again to the default ;.

Inside mySQLBenchmark we have a Trigger tab for each table, so we can specify the trigger inside it. Using a regular command line, we have to open the SQL CLI and first define the DELIMITER $$, then the trigegr code and define the DELIMITER ; again.

DELIMITER $$
CREATE
  TRIGGER my_trigger BEFORE INSERT
  ON employee
  FOR EACH ROW BEGIN
    INSERT INTO trigger_test VALUES(NEW.first_name);
  END$$
DELIMITER ;

Now we are inserting the new employee first_name everytime a new employee is inserted into the employee table. We’re using the NEW keyword here.

DELIMITER $$
CREATE
  TRIGGER my_trigger BEFORE INSERT
  ON employee
  FOR EACH ROW BEGIN
    IF NEW.sex = 'M' THEN
      INSERT INTO trigger_test VALUES('added male employee');
    ELSEIF NEW.sex = 'F' THEN
      INSERT INTO trigger_test VALUES('added female');
    ELSE
      INSERT INTO trigger_test VALUES('added other employee');
    END IF;
  END$$
DELIMITER ;

This is more complex because it has IF ELSE statements.

In the mySQL CLI, we can drop a trigger like: DROP TRIGGER {{name of trigger}}

Entity Relationship Diagrams (ER Diagrams)

Using ER Diagrams is a good way to map and plan the database schema. We take the requirements and relationships between tables and turn into a diagram. So later we can create our schemas in a better way.