SQL#

  1. Basics

  2. Data Types

  3. Commands & Queries

  4. Stored Procedures

  5. ER Diagrams

  6. References & External Resources

back to top

Basics#

  • RDBMS Basics, SQL Categories

  • Structured Query Language, used to communicate with data in RDBMS (Relational Database Management System)

RDBMS Basics#

  • Table
    • structured list of specific data type

    • table names are unique and cannot use the same name in the same database, but can be reused in different ones

    • some databases use the owner name as part of the unique table name

    • Schema: describe database and table layout and properties

  • Column
    • define single attribute in a table, has an associated data type

    • tables are made up of one or more columns

    • data should broken up into different columns

    • combined data in one column makes it difficult to sort or filter

  • Row
    • a single entry/record in a table

  • Primary Key
    • a column or a set of columns to make a row unique, used to refer to a specific row

    • not mandatory, but each table should have a primary key column

    • cannot have duplicates, be NULL, be modified, and can never be reused even if a row is deleted

  • Surrogate Key: primary key that serves only as unique id, no mapping to the real-world data

  • Natural Key: primary key that has a mapping to the real-world data

  • Foreign Key: primary key in another table and relates to it, a table can have more than one foreign key, need to create table first before defining foreign keys

  • Composite Key: made up of two keys (primary or foreign), used when one column doesn’t uniquely identify each row

  • different RDBMS implement SQL differently

SQL Categories#

  • SQL code is not always portable to another without modification

  • DQL (Data Query Language): to query already present data

  • DDL (Data Definition Language): to define database schemas

  • DCL (Data Control Language): control access to the data, user and permission management

  • DML (Data Manipulation Language): to insert, update and delete data

  • TCL (Transaction Control Language): manage transactions within the database

back to top

Data Types#

Numeric#

  • INT
    • integer, whole numbers

    • e.g 1, 10, 999

  • DECIMAL(M,N)
    • M: total number of digits

    • N: total number of digits after the decimal point

    • e.g DECIMAL(7,3): 1234.567

String#

  • VARCHAR(l)
    • variable character, string

    • l: maximum number of characters

    • e.g VARCHAR(3): abc

Binary#

  • BLOB
    • binary large object, usually for images or files

Date & Time#

  • DATE
    • specific format of date

    • e.g ‘YYYY-MM-DD’

  • TIMESTAMP
    • specific date format with time, usually for records

    • e.g ‘YYYY-MM-DD HH:MM:SS’

back to top

Commands & Queries#

DDL#

  • Data Definition Language, CREATE, ALTER, DROP, TRUNCATE, USE, SHOW

  • CREATE
    • primary key is NOT NULL and UNIQUE by default

    • NOT NULL must be explicitly specified, otherwise NULL by default

    • can use DEFAULT values instead of NULL columns

    • some foreign keys must be set to NULL first, and then updates it, as the other table isn’t created or populated yet

    • some DBMSs need table location to be created

    CREATE DATABASE db_name;
    
    CREATE TABLE table_name
    (
        column_1 INT PRIMARY KEY NOT NULL,
        column_2 VARCHAR(50)     NULL
    );
    
    CREATE TABLE table_name
    (
        column_1 INT PRIMARY KEY,
        column_2 VARCHAR(50),
        column_3 INT DEFAULT 1,
        PRIMARY KEY(c1)
    );
    
    -- set multiple attributes as primary, composite key
    CREATE TABLE t1(PRIMARY KEY(c1,c2));
    
    CREATE TABLE t1 (c1 INT NOT NULL); -- c1 cannot be empty when insert
    CREATE TABLE t1 (c1 INT UNIQUE); -- cannot insert duplicate values
    
    -- add foreign key while creating table, referenced table must already exist
    CREATE TABLE t3(FOREIGN KEY(c1) REFERENCES t2(c1) ON DELETE SET NULL);
    CREATE TABLE t3(FOREIGN KEY(c1) REFERENCES t2(c1) ON DELETE CASCADE);
    
  • ALTER
    • update table definition by modifying columns

    • if possible, avoid altering the table after data is added

    • for complex changes, create a new table and copy old data with INSERT SELECT

    • always backup before making changes

    • SQLite does not allow to alter primary and foreign keys

    ALTER TABLE t1 ADD c3 DECIMAL;
    ALTER TABLE t1 DROP COLUMN c3;
    ALTER TABLE t1 ADD FOREIGN KEY(c1) REFERENCES t2(c1); -- add FK to existing table
    ALTER TABLE t1 RENAME TO t2; -- rename table in SQLite
    
  • DROP
    • delete entire table

    • most DBMSs prevent dropping of tables that are related to other tables

    DROP TABLE t1;
    
  • TRUNCATE

  • USE

    USE db_name;
    
  • SHOW
    • non-standardised but widely used to get information about database environment and structure

    SHOW DATABASES; -- list databases
    SHOW TABLES; -- list tables
    

DML#

  • Data Manipulation Language, INSERT, UPDATE, DELETE

  • INSERT
    • can insert a complete or partial row and the results of a query

    • use NULL if a column has no value

    • explicitly stating column names to insert is a safe way to write queries

    • without column names, insert values order must be the same as in table definition

    • cannot insert values with duplicate primary key

    • INSERT SELECT: import data from existing table to another

    • CREATE SELECT or SELECT INTO: import data from existing table to a new one, not supported by DB2

    INSERT INTO t1(c1) VALUES(x); -- insert only specific, safe way to write insert
    INSERT INTO t1 VALUES(x, y, NULL); -- order matters, not safe!
    
    INSERT INTO t1(c1, c2) SELECT c1, c2 FROM t2;
    
    CREATE TABLE t3 AS SELECT * FROM t1; -- some DBMS can overwrite the table if exists
    SELECT * INTO t3 FROM t1; -- syntax for SQL Server
    
  • UPDATE
    • update specific or all rows in a table

    • never forget WHERE, or all rows will be updated

    • can use subqueries to update with data retrieved from SELECT

    • some DBMSs allow FROM to update rows with data from another table

    • use NULL to delete column value, as empty string is still a value

    UPDATE t1 SET c1='new_value' WHERE c2='x';
    UPDATE t1 SET c1='new_value' WHERE c2='x' OR c2='y';
    UPDATE t1 SET c1='new_value', c2='new_value' WHERE c3='x';
    
    UPDATE t1 SET c1='new_value'; -- apply to every row
    
    UPDATE t1 SET c1=NULL WHERE c2='x'; -- delete column value
    
  • DELETE
    • can delete specific or all rows

    • never forget WHERE, or all rows will be deleted

    • rows needed for a relationship are prevented deletion

    • never delete the table itself

    • TRUNCATE TABLE: faster way to delete all rows as data changes are not logged

    DELETE FROM t1 WHERE c1='x';
    
    DELETE FROM t1; -- delete every row
    

DQL#

  • Data Query Language, SELECT, DESCRIBE, VIEW

  • different DBMSs and clients display data differently

  • data formatting is a presentation issue, not a retrieval issue

  • avoid wildcards if possible, as they can reduce the performance of retrieval

  • Clause Ordering: SELECT, FROM, WHERE or GROUP BY, HAVING or ORDER BY

  • Retrieve
    • the first row retrieved is row 0

    SELECT * FROM t1; -- retrieve all columns
    
    SELECT c1 FROM t1;
    SELECT t1.c1 FROM t1; -- prefix with table_name is useful for complex queries
    
    -- get only distinct attributes, multiple columns will be combined and get unique ones
    SELECT DISTINCT c1 FROM t1;
    
    SELECT c1 FROM t1 LIMIT 3; -- limit result row numbers
    SELECT c1 FROM t1 LIMIT 3 OFFSET 3; -- get limited rows starting from the offset
    SELECT c1 AS new_name FROM t1; -- change result column name
    
  • Sort
    • ASC by default, should be after WHERE clause

    • letter cases are same in dictionary sort order, but can be changed by DBMS admin

    • sorts generated output

    SELECT c1,c2 FROM t1 ORDER BY c2;
    SELECT c1,c2 FROM t1 ORDER BY c1, c2 DESC; -- descending only by c2
    SELECT c1,c2 FROM t1 ORDER BY c3; -- can order even if not included in query
    SELECT c1,c2 FROM t1 ORDER BY c2,c1; -- order by c2 first, order by c1 again if needed
    
    -- order by relative column position in the SELECT list
    SELECT c1, c2, c3 FROM t1 ORDER BY 2, 3
    
  • Filter
    • AND is processed before OR

    SELECT c1 FROM t1 WHERE c1='x' ORDER BY c2; -- filter and sort
    SELECT c1 FROM t1 WHERE c1 BETWEEN x AND y; -- check for a range of values
    SELECT c1 FROM t1 WHERE c1 IS NULL; -- check for NULL values
    SELECT c1 FROM t1 where c1 = 'x' AND c2 < 3; -- filter with multiple conditions
    SELECT c1 FROM t1 WHERE c1='x' OR c3='y'; -- filter with multiple conditions
    
    -- can group related operators with parentheses
    SELECT c1 FROM t1 WHERE (c1='x' OR c1='y') AND c2 = 3;
    
    SELECT c1 FROM t1 WHERE c1 IN ('v1', 'v2', 'v3'); -- can be any one of the values
    SELECT c1 FROM t1 where NOT c2='x';` -- [negate](negate) a condition, same as using `<>`
    
  • Wildcards
    • only for text fields, must use LIKE for wildcards

    • wildcard searches take longer than other search types

    • do not use wildcards unless really necessary

    • %: any number of characters

    • _: one character, not supported by DB2

    • []: specify a set of characters, not supported by all DBMSs

    SELECT c1 FROM t1 WHERE c1 LIKE '%og'; -- e.g. 'aog' or 'aaaaog'
    SELECT c1 FROM t1 WHERE c1 LIKE '%og%'; -- e.g. 'aaaaogaaaa'
    SELECT c1 FROM t1 WHERE c1 LIKE '__og'; -- e.g. 'aaog'
    SELECT c1 FROM t1 WHERE c1 LIKE '[AB]%'; -- e.g. 'Acd', 'Bcd'
    SELECT c1 FROM t1 WHERE c1 LIKE '[^AB]%'; -- any that does not begin with 'A' or 'B'
    
  • Calculated Fields
    • created within SELECT statement

    • it is optimal to convert and reformat data on database server than on the client

    • Concatenate: + or || based on DBMS

    • remove extra padded spaces: TRIM(), RTRIM(), LTRIM()

    • AS: give alternate name for a field or value, useful with calculated fields, cannot be accessed from a separate query

    • can also apply mathematical operations +, -, *, /

    SELECT c1 || '(' || c2 || ')' FROM t1;
    
    -- MySQL or MariaDB
    SELECT Concat(c1, '(', c2, ')') FROM t1;
    
    -- use alias
    SELECT c1 || '(' || c2 || ')' AS alias1 FROM t1;
    
    SELECT c1, c2, c1*c2 AS alias1 FROM T1;
    
  • Show Structure
    • DESCRIBE t1;: show structure of the table

  • Data Grouping
    • GROUP BY: divide data into logical groups and perform aggregate calculations, column cannot be aggregate function or aliases

    • HAVING: filter groups instead of rows, allow all WHERE options

    SELECT c1, COUNT(*) FROM t1 GROUP BY c1;
    
  • Views
    • virtual tables with queries that dynamically retrieve data when used

    • SQLite only supports read-only views

    • to simplify and reuse SQL statements, but complex views may reduce performance

    • ORDER BY is not allowed in views

    • use CREATE and DROP like tables, but must drop and recreate to update a view

    CREATE VIEW v1 AS SELECT c1, c2 FROM t1 INNER JOIN t2 ON t1.c3 = t2.c3;
    DROP VIEW v1;
    

DCL#

  • Data Control Language, GRANT, REVOKE

TCL#

  • Transaction Control Language, COMMIT, ROLLBACK, SAVEPOINT

  • Transaction: a block of SQL statements, must execute completely or not at all to maintain data integrity

  • Rollback: undo specified SQL statements

  • Commit: write unsaved SQL statements

  • Savepoint: temporary placeholder in a transaction set to be rollback to

  • can manage INSERT, UPDATE, and DELETE, but not SELECT, CREATE and DROP

  • a transaction exists until it is terminated

-- MSSQL
BEGIN TRANSACTION
    -- statements
COMMIT TRANSACTION

-- MariaDB & MySQL
START TRANSACTION

-- Oracle
SET TRANSACTION

-- PostgreSQL
BEGIN

DELETE FROM Orders;
ROLLBACK;

Constraints#

  • UNIQUE

  • DEFAULT: CREATE TABLE t1 (c1 INT DEFAULT 66);

  • AUTO_INCREMENT: data can be left when insert, will auto increase based on previous, default start at 1, use ALTER TABLE t1 AUTO_INCREMENT=100; to change

  • ON DELETE SET NULL: foreign key will be set to NULL if row on referenced table is deleted

  • CREATE TABLE t3(FOREIGN KEY(c1) REFERENCES t2(c1) ON DELETE SET NULL); t3.c1 will be “sql” resulted in an error at token: ‘$’NULL if t2.c1 is deleted

  • ON DELETE CASCADE: row on foreign key will be deleted if row on referenced table is deleted

  • use when foreign key will also be primary key of the table

  • CREATE TABLE t3(FOREIGN KEY(c1) REFERENCES t2(c1) ON DELETE CASCADE); t3.c1 will be deleted if t2.c1 is deleted

Functions#

  • special block of code used when querying data

  • not portable and most functions are DBMS specific

  • function names are not case sensitive

  • Text
    • SUBSTRING(), SUBSTR()

    • UPPER(), LOWER()

    • LEFT(): characters from left

    • RIGHT()

    • LENGTH(), DATALENGTH(), LEN()

    • LTRIM(), RTRIM()

    • SOUNDEX(): alphanumeric pattern of phonetic representation of text, e.g. searching names similar to the queried name

  • Date & Time
    • CURRENT_DATE, CURDATE(), SYSDATE, GETDATE(), DATE()

    • DATEPART(), STRFTIME(), DATE_PART(), EXTRACT(year FROM), BETWEEN to_date() AND to_date(), YEAR()

  • Numeric
    • ABS(), COS(), EXP(), PI(), SIN(), SQRT(), TAN()

  • Data Type Conversion
    • CAST(): DB2, PostgreSQL, SQL Server

    • CONVERT(): MariaDB, MySQL, SQLite

  • Aggregate
    • operate on a set of rows and return single value

    • mostly consistent in SQL implementations

    • AVG(): ignore rows with NULL

    • COUNT(): use * to include rows with NULL

    • MAX(): with text data, return the sorted last row by the given column

    • MIN(), SUM()

    • never use DISTINCT with COUNT(*)

Nested Queries#

  • using results from one query in another

  • will execute the innermost query and use that result to execute outer query

  • use fully qualified column name, table.column, when working with multiple tables

  • use constraints if necessary, as result from one query could be incompatible with other, especially when using equal

  • break the queries over multiple lines and indent properly for readability

SELECT t1.c1
FROM t1
WHERE t1.c2 IN (SELECT t2.c1
                FROM t2
                WHERE t2.c1 = x);

Joins#

  • join separate tables on specific common column horizontally

  • must specify all tables to be included, and use fully qualified column names

  • resource intensive as joins are processed at runtime

  • most DBMSs have restricted the maximum number of tables per join

  • Cross Join
    • Cartesian Product returned without join condition

    • number of rows in first table multiplied by the number of rows in the second table

    • can get incorrect data

    SELECT t1.c1, t1.c2, t2.c2 FROM t1, t2;
    
  • Inner Join
    • only common rows will be returned

    • will return duplicates of columns

    SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c1 = t2.c1;
    
    -- multiple tables
    SELECT c1, c2
    FROM t1
    INNER JOIN t2
        ON t1.c3 = t2.c3
    INNER JOIN t3
        ON t2.c4 = t3.c4
    WHERE t3.c5 = 'x';
    
  • Equi Join
    • just INNER or OUTER JOIN with using only = as join condition

    -- ANSI SQL style
    SELECT t1.c1, t2.c2 FROM t1 JOIN t2 ON t1.c1 = t2.c1;
    
    -- older SQL-89 style, considered deprecated in most databases
    SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c1;
    
  • Self Join
    • to get data from the same table

    • should use aliases to refer to the same table multiple times

    SELECT t1.c1, t1.c2, t1.c3
    FROM Table1 AS t1
    INNER JOIN Table1 AS t2
    ON t1.c1 = t2.c1
    WHERE t2.c3 = 'x';
    
  • Natural Join
    • selecting columns that are unique

    • only return one column each from the duplicates

    SELECT * FROM t1 NATURAL JOIN t2;
    
  • Outer Join
    • Left: also retrieve non-associated rows from left table

    • Right: also retrieve non-associated rows from right table

    • Full: also retrieve non-associated rows from both tables, same as left join and right join combined

    • has various form of syntax based on SQL implementations

    • cannot do in MySQL

    -- also retrieve non-associated rows from t1
    SELECT t1.c1, t2.c1 FROM t1 LEFT OUTER JOIN t2 ON t1.c3 = t2.c3;
    
    -- also retrieve non-associated rows from t2
    SELECT t1.c1, t2.c1 FROM t1 RIGHT OUTER JOIN t2 ON t1.c3 = t2.c3;
    
    -- also retrieve non-associated rows from both t1 and t2
    SELECT t1.c1, t2.c1 FROM t1 FULL OUTER JOIN t2 ON t1.c3 = t2.c3;
    

Union#

  • also called compound queries, combine columns from multiple SELECT statements as a single query result

  • used to get similarly structured data from different tables or multiple queries against a single table in a single query

  • combining two queries to the same table is the same as a single query with multiple WHERE

  • sometimes UNION query can be more complicated than using WHERE

  • duplicate rows are auto removed, and must use UNION ALL to include them

  • Rules
    • must have two or more SELECT

    • must have same columns, expressions, or aggregate functions

    • columns must be of same or compatible data type, different data types will be converted if possible

    • if different column names, the first name is used

    • can only use one ORDER BY, and it must be after the final SELECT

  • Additional UNION Types
    • EXCEPT or MINUS: retrieve only rows that exist in the first table

    • INTERSECT: retrieve only rows that exist in both tables

SELECT c1 FROM t1 UNION SELECT c2 FROM t2; -- result c1 and c2 in single column

SELECT c1 as new_name FROM t1 UNION SELECT c2 FROM t2; -- result column will be new_name

Triggers#

  • block of SQL that will be performed when certain action is operated

  • e.g add a row to t2 when certain data is deleted on t1

  • delete trigger: DROP TRIGGER trig1;

  • Create Trigger

    DELIMITER $$ -- change SQL delimiter from ';' to '$$'
    CREATE
        TRIGGER trig1 BEFORE INSERT -- can also use AFTER INSERT/DELETE/UPDATE
        ON t1
        FOR EACH ROW BEGIN
            INSERT INTO t2 VALUES('action triggered');
            INSERT INTO t2 VALUES(NEW.c1); -- add value of c1, to be added to t1, into t2 first
            IF NEW.c1 = 'x' THEN
                INSERT INTO t2 VALUES(NEW.c1);
            ELSEIF NEW.c1 = 'y' THEN
                INSERT INTO t2 VALUES('wrong value');
            ELSE
                INSERT INTO t2 VALUES('default value');
            END IF;
        END$$ -- CREATE command ends here
    DELIMITER ; -- change SQL delimiter back to ';'
    

back to top

Stored Procedures#

  • Oracle Syntax, MSSQL Syntax

  • collections of SQL statements to simplify complex operations

  • not supported by SQLite

  • syntax varies from one DBMS to another

  • can have optional parameters with default values, out-of-order parameters, update a parameter or return a value to the executing application

  • always comment when writing stored procedures

Oracle Syntax#

  • code is within BEGIN and END

  • IN: pass parameter to procedure

  • OUT: pass parameter from procedure

  • INOUT: pass parameters to and from the procedure

-- Creating
CREATE PROCEDURE MailingListCount (
    ListCount OUT INTEGER -- pass a value from the procedure
)
IS
v_rows INTEGER;
BEGIN
    SELECT COUNT(*) INTO v_rows
    FROM Customers
    WHERE NOT cust_email IS NULL;
    ListCount := v_rows;
END;

-- Executing
var ReturnValue NUMBER -- variable to hold return value
EXEC MailingListCount(:ReturnValue); -- execute the procedure
SELECT ReturnValue; -- display the return value

MSSQL Syntax#

  • all local variable names start with @

-- Creating
CREATE PROCEDURE MailingListCount -- no parameter
AS
DECLARE @cnt INTEGER -- local variable
SELECT @cnt = COUNT(*) -- store into variable
FROM Customers
WHERE NOT cust_email IS NULL;
RETURN @cnt

-- Executing
DECLARE @ReturnValue INT
EXECUTE @ReturnValue=MailingListCount;
SELECT @ReturnValue;

-- Procedure to insert into table
CREATE PROCEDURE NewOrder @cust_id CHAR(10) -- accept a parameter
AS
DECLARE @order_num INTEGER
SELECT @order_num=MAX(order_num)
FROM Orders
SELECT @order_num=@order_num+1
INSERT INTO Orders(order_num, order_date, cust_id)
VALUES(@order_num, GETDATE(), @cust_id)
RETURN @order_num

-- Different way to insert into table
CREATE PROCEDURE NewOrder @cust_id CHAR(10) -- accept a parameter
AS
INSERT INTO Orders(cust_id)
Values(@cust_id)
SELECT order_num = @@IDENTITY; -- generated by DBMS, GETDATE() is used by default
-- access @@IDENTITY to get generated order_num

back to top

ER Diagrams#

  • Entity, Attributes, Relationship, ER to Schema

  • Entity Relationship Diagrams, helps to convert storage/business requirements into database schema

  • uses shapes and arrows or graphs to define relationship

Entity#

  • object to be modeled and stored information about

  • use rectangle shape

  • can define more than one entity in a diagram

  • Weak Entity
    • cannot be uniquely identified only by its attributes

    • usually depends on other entities

    • use double-layered rectangle shape

    • always need to have total participation in the relationship

Attributes#

  • specific information about an entity

  • use oval shape

  • Primary Key
    • uniquely define an entry in the database table

    • use oval shape, text is underlined

  • Composite Attribute
    • can be broken into sub-attributes

    • e.g name can be separated into first_name and last_name

  • Multi-Valued Attribute
    • can have more than one value

    • use double-layered oval shape

    • e.g a person can have more than one phone numbers

  • Derived Attribute
    • can be derived from other attributes

    • use dotted-line oval shape

    • e.g a student’s attribute passed can be derived from attribute test_score

Relationship#

  • connect multiple entities in a single diagram

  • use diamond shape

  • connect with single line for partial participation and double line for total

  • Relationship Attribute
    • a relationship can have attributes

    • attributes are not stored on entity but on the relationship

  • Relationship Cardinality
    • maximum number of times an instance in one entity can relate to instances of another entity

    • e.g 1:1, 1:N, N:M

  • Identifying Relationship
    • use double-layered diamond shape

    • serves to uniquely identify the weak entity

ER to Schema#

  • binary: two entity participating

  • Create
    • create relation/table for each regular entity with all simple attributes

    • create relation/table for each weak entity with all simple attributes

    • primary key should be the key of weak entity plus the primary key of its owner

  • Map 1:1
    • add primary key of one entity as foreign key in the one that has total participation

    • if both are partial or total, use most convenient approach

  • Map 1:N
    • add 1 side primary key as foreign key on the N side

  • Map M:N
    • create new relation/table, with primary key as the combination of both entities’ primary keys, and include any relationship attributes

back to top

References & External Resources#

  • Forta, B. (2020). Sams Teach Yourself SQL in 10 Minutes, Fifth Edition. Hoboken, NJ: Sams.

back to top