SQL#
Basics#
Structured Query Language, used to communicate with data in RDBMS (Relational Database Management System)
RDBMS Basics#
- Table
structured list of specific data type
table names are unique and cannot use the same name in the same database, but can be reused in different ones
some databases use the owner name as part of the unique table name
Schema: describe database and table layout and properties
- Column
define single attribute in a table, has an associated data type
tables are made up of one or more columns
data should broken up into different columns
combined data in one column makes it difficult to sort or filter
- Row
a single entry/record in a table
- Primary Key
a column or a set of columns to make a row unique, used to refer to a specific row
not mandatory, but each table should have a primary key column
cannot have duplicates, be
NULL, be modified, and can never be reused even if a row is deletedSurrogate Key: primary key that serves only as unique id, no mapping to the real-world data
Natural Key: primary key that has a mapping to the real-world data
Foreign Key: primary key in another table and relates to it, a table can have more than one foreign key, need to create table first before defining foreign keys
Composite Key: made up of two keys (primary or foreign), used when one column doesn’t uniquely identify each row
different RDBMS implement SQL differently
SQL Categories#
SQL code is not always portable to another without modification
DQL (Data Query Language): to query already present data
DDL (Data Definition Language): to define database schemas
DCL (Data Control Language): control access to the data, user and permission management
DML (Data Manipulation Language): to insert, update and delete data
TCL (Transaction Control Language): manage transactions within the database
Data Types#
restrict the type of data to be stored in a column
some data types differ depending on DBMS
Numeric#
INT
integer, whole numbers
e.g 1, 10, 999
DECIMAL(M,N)
M: total number of digits
N: total number of digits after the decimal point
e.g DECIMAL(7,3): 1234.567
String#
VARCHAR(l)
variable character, string
l: maximum number of characters
e.g VARCHAR(3): abc
Binary#
BLOB
binary large object, usually for images or files
Date & Time#
DATE
specific format of date
e.g ‘YYYY-MM-DD’
TIMESTAMP
specific date format with time, usually for records
e.g ‘YYYY-MM-DD HH:MM:SS’
Commands & Queries#
DDL, DML, DQL, DCL, TCL, Constraints, Functions, Nested Queries, Joins, Union, Triggers
instructions to manipulate and fetch the database, upper case convention
comment:
-- single line,/* multi line */operators:
=, !=, <> (not equal), <, <=, !<, >, >=, !>, AND, OR, BETWEEN, IS NULL, NOT, IN
DDL#
Data Definition Language,
CREATE,ALTER,DROP,TRUNCATE,USE,SHOW
CREATE
primary key is
NOT NULLandUNIQUEby default
NOT NULLmust be explicitly specified, otherwiseNULLby defaultcan use
DEFAULTvalues instead ofNULLcolumnssome foreign keys must be set to
NULLfirst, and then updates it, as the other table isn’t created or populated yetsome DBMSs need table location to be created
CREATE DATABASE db_name; CREATE TABLE table_name ( column_1 INT PRIMARY KEY NOT NULL, column_2 VARCHAR(50) NULL ); CREATE TABLE table_name ( column_1 INT PRIMARY KEY, column_2 VARCHAR(50), column_3 INT DEFAULT 1, PRIMARY KEY(c1) ); -- set multiple attributes as primary, composite key CREATE TABLE t1(PRIMARY KEY(c1,c2)); CREATE TABLE t1 (c1 INT NOT NULL); -- c1 cannot be empty when insert CREATE TABLE t1 (c1 INT UNIQUE); -- cannot insert duplicate values -- add foreign key while creating table, referenced table must already exist CREATE TABLE t3(FOREIGN KEY(c1) REFERENCES t2(c1) ON DELETE SET NULL); CREATE TABLE t3(FOREIGN KEY(c1) REFERENCES t2(c1) ON DELETE CASCADE);
ALTER
update table definition by modifying columns
if possible, avoid altering the table after data is added
for complex changes, create a new table and copy old data with
INSERT SELECTalways backup before making changes
SQLite does not allow to alter primary and foreign keys
ALTER TABLE t1 ADD c3 DECIMAL; ALTER TABLE t1 DROP COLUMN c3; ALTER TABLE t1 ADD FOREIGN KEY(c1) REFERENCES t2(c1); -- add FK to existing table ALTER TABLE t1 RENAME TO t2; -- rename table in SQLite
DROP
delete entire table
most DBMSs prevent dropping of tables that are related to other tables
DROP TABLE t1;
TRUNCATE
USEUSE db_name;
SHOW
non-standardised but widely used to get information about database environment and structure
SHOW DATABASES; -- list databases SHOW TABLES; -- list tables
DML#
Data Manipulation Language,
INSERT,UPDATE,DELETE
INSERT
can insert a complete or partial row and the results of a query
use
NULLif a column has no valueexplicitly stating column names to insert is a safe way to write queries
without column names, insert values order must be the same as in table definition
cannot insert values with duplicate primary key
INSERT SELECT: import data from existing table to another
CREATE SELECTorSELECT INTO: import data from existing table to a new one, not supported by DB2INSERT INTO t1(c1) VALUES(x); -- insert only specific, safe way to write insert INSERT INTO t1 VALUES(x, y, NULL); -- order matters, not safe! INSERT INTO t1(c1, c2) SELECT c1, c2 FROM t2; CREATE TABLE t3 AS SELECT * FROM t1; -- some DBMS can overwrite the table if exists SELECT * INTO t3 FROM t1; -- syntax for SQL Server
UPDATE
update specific or all rows in a table
never forget
WHERE, or all rows will be updatedcan use subqueries to update with data retrieved from
SELECTsome DBMSs allow
FROMto update rows with data from another tableuse
NULLto delete column value, as empty string is still a valueUPDATE t1 SET c1='new_value' WHERE c2='x'; UPDATE t1 SET c1='new_value' WHERE c2='x' OR c2='y'; UPDATE t1 SET c1='new_value', c2='new_value' WHERE c3='x'; UPDATE t1 SET c1='new_value'; -- apply to every row UPDATE t1 SET c1=NULL WHERE c2='x'; -- delete column value
DELETE
can delete specific or all rows
never forget
WHERE, or all rows will be deletedrows needed for a relationship are prevented deletion
never delete the table itself
TRUNCATE TABLE: faster way to delete all rows as data changes are not loggedDELETE FROM t1 WHERE c1='x'; DELETE FROM t1; -- delete every row
DQL#
Data Query Language,
SELECT,DESCRIBE,VIEWdifferent DBMSs and clients display data differently
data formatting is a presentation issue, not a retrieval issue
avoid wildcards if possible, as they can reduce the performance of retrieval
Clause Ordering:
SELECT,FROM,WHEREorGROUP BY,HAVINGorORDER BY
- Retrieve
the first row retrieved is row 0
SELECT * FROM t1; -- retrieve all columns SELECT c1 FROM t1; SELECT t1.c1 FROM t1; -- prefix with table_name is useful for complex queries -- get only distinct attributes, multiple columns will be combined and get unique ones SELECT DISTINCT c1 FROM t1; SELECT c1 FROM t1 LIMIT 3; -- limit result row numbers SELECT c1 FROM t1 LIMIT 3 OFFSET 3; -- get limited rows starting from the offset SELECT c1 AS new_name FROM t1; -- change result column name
- Sort
ASC by default, should be after
WHEREclauseletter cases are same in dictionary sort order, but can be changed by DBMS admin
sorts generated output
SELECT c1,c2 FROM t1 ORDER BY c2; SELECT c1,c2 FROM t1 ORDER BY c1, c2 DESC; -- descending only by c2 SELECT c1,c2 FROM t1 ORDER BY c3; -- can order even if not included in query SELECT c1,c2 FROM t1 ORDER BY c2,c1; -- order by c2 first, order by c1 again if needed -- order by relative column position in the SELECT list SELECT c1, c2, c3 FROM t1 ORDER BY 2, 3
- Filter
ANDis processed beforeORSELECT c1 FROM t1 WHERE c1='x' ORDER BY c2; -- filter and sort SELECT c1 FROM t1 WHERE c1 BETWEEN x AND y; -- check for a range of values SELECT c1 FROM t1 WHERE c1 IS NULL; -- check for NULL values SELECT c1 FROM t1 where c1 = 'x' AND c2 < 3; -- filter with multiple conditions SELECT c1 FROM t1 WHERE c1='x' OR c3='y'; -- filter with multiple conditions -- can group related operators with parentheses SELECT c1 FROM t1 WHERE (c1='x' OR c1='y') AND c2 = 3; SELECT c1 FROM t1 WHERE c1 IN ('v1', 'v2', 'v3'); -- can be any one of the values SELECT c1 FROM t1 where NOT c2='x';` -- [negate](negate) a condition, same as using `<>`
- Wildcards
only for text fields, must use
LIKEfor wildcardswildcard searches take longer than other search types
do not use wildcards unless really necessary
%: any number of characters
_: one character, not supported by DB2
[]: specify a set of characters, not supported by all DBMSsSELECT c1 FROM t1 WHERE c1 LIKE '%og'; -- e.g. 'aog' or 'aaaaog' SELECT c1 FROM t1 WHERE c1 LIKE '%og%'; -- e.g. 'aaaaogaaaa' SELECT c1 FROM t1 WHERE c1 LIKE '__og'; -- e.g. 'aaog' SELECT c1 FROM t1 WHERE c1 LIKE '[AB]%'; -- e.g. 'Acd', 'Bcd' SELECT c1 FROM t1 WHERE c1 LIKE '[^AB]%'; -- any that does not begin with 'A' or 'B'
- Calculated Fields
created within
SELECTstatementit is optimal to convert and reformat data on database server than on the client
Concatenate:
+or||based on DBMSremove extra padded spaces:
TRIM(),RTRIM(),LTRIM()
AS: give alternate name for a field or value, useful with calculated fields, cannot be accessed from a separate querycan also apply mathematical operations
+, -, *, /SELECT c1 || '(' || c2 || ')' FROM t1; -- MySQL or MariaDB SELECT Concat(c1, '(', c2, ')') FROM t1; -- use alias SELECT c1 || '(' || c2 || ')' AS alias1 FROM t1; SELECT c1, c2, c1*c2 AS alias1 FROM T1;
- Show Structure
DESCRIBE t1;: show structure of the table
- Data Grouping
GROUP BY: divide data into logical groups and perform aggregate calculations, column cannot be aggregate function or aliases
HAVING: filter groups instead of rows, allow allWHEREoptionsSELECT c1, COUNT(*) FROM t1 GROUP BY c1;
- Views
virtual tables with queries that dynamically retrieve data when used
SQLite only supports read-only views
to simplify and reuse SQL statements, but complex views may reduce performance
ORDER BYis not allowed in viewsuse
CREATEandDROPlike tables, but must drop and recreate to update a viewCREATE VIEW v1 AS SELECT c1, c2 FROM t1 INNER JOIN t2 ON t1.c3 = t2.c3; DROP VIEW v1;
DCL#
Data Control Language,
GRANT,REVOKE
TCL#
Transaction Control Language,
COMMIT,ROLLBACK,SAVEPOINTTransaction: a block of SQL statements, must execute completely or not at all to maintain data integrity
Rollback: undo specified SQL statements
Commit: write unsaved SQL statements
Savepoint: temporary placeholder in a transaction set to be rollback to
can manage
INSERT,UPDATE, andDELETE, but notSELECT,CREATEandDROPa transaction exists until it is terminated
-- MSSQL BEGIN TRANSACTION -- statements COMMIT TRANSACTION -- MariaDB & MySQL START TRANSACTION -- Oracle SET TRANSACTION -- PostgreSQL BEGIN DELETE FROM Orders; ROLLBACK;
Constraints#
UNIQUE
DEFAULT:CREATE TABLE t1 (c1 INT DEFAULT 66);
AUTO_INCREMENT: data can be left when insert, will auto increase based on previous, default start at 1, use ALTER TABLE t1 AUTO_INCREMENT=100; to change
ON DELETE SET NULL: foreign key will be set toNULLif row on referenced table is deleted
CREATE TABLE t3(FOREIGN KEY(c1) REFERENCES t2(c1) ON DELETE SET NULL);t3.c1 will be “sql” resulted in an error at token: ‘$’NULL if t2.c1 is deleted
ON DELETE CASCADE: row on foreign key will be deleted if row on referenced table is deleteduse when foreign key will also be primary key of the table
CREATE TABLE t3(FOREIGN KEY(c1) REFERENCES t2(c1) ON DELETE CASCADE);t3.c1 will be deleted if t2.c1 is deleted
Functions#
special block of code used when querying data
not portable and most functions are DBMS specific
function names are not case sensitive
- Text
SUBSTRING(),SUBSTR()
UPPER(),LOWER()
LEFT(): characters from left
RIGHT()
LENGTH(),DATALENGTH(),LEN()
LTRIM(),RTRIM()
SOUNDEX(): alphanumeric pattern of phonetic representation of text, e.g. searching names similar to the queried name
- Date & Time
CURRENT_DATE,CURDATE(),SYSDATE,GETDATE(),DATE()
DATEPART(),STRFTIME(),DATE_PART(),EXTRACT(year FROM),BETWEEN to_date() AND to_date(),YEAR()
- Numeric
ABS(),COS(),EXP(),PI(),SIN(),SQRT(),TAN()
- Data Type Conversion
CAST(): DB2, PostgreSQL, SQL Server
CONVERT(): MariaDB, MySQL, SQLite
- Aggregate
operate on a set of rows and return single value
mostly consistent in SQL implementations
AVG(): ignore rows withNULL
COUNT(): use * to include rows withNULL
MAX(): with text data, return the sorted last row by the given column
MIN(),SUM()never use
DISTINCTwithCOUNT(*)
Nested Queries#
using results from one query in another
will execute the innermost query and use that result to execute outer query
use fully qualified column name,
table.column, when working with multiple tablesuse constraints if necessary, as result from one query could be incompatible with other, especially when using equal
break the queries over multiple lines and indent properly for readability
SELECT t1.c1 FROM t1 WHERE t1.c2 IN (SELECT t2.c1 FROM t2 WHERE t2.c1 = x);
Joins#
join separate tables on specific common column horizontally
must specify all tables to be included, and use fully qualified column names
resource intensive as joins are processed at runtime
most DBMSs have restricted the maximum number of tables per join
- Cross Join
Cartesian Product returned without join condition
number of rows in first table multiplied by the number of rows in the second table
can get incorrect data
SELECT t1.c1, t1.c2, t2.c2 FROM t1, t2;
- Inner Join
only common rows will be returned
will return duplicates of columns
SELECT t1.c1, t2.c2 FROM t1 INNER JOIN t2 ON t1.c1 = t2.c1; -- multiple tables SELECT c1, c2 FROM t1 INNER JOIN t2 ON t1.c3 = t2.c3 INNER JOIN t3 ON t2.c4 = t3.c4 WHERE t3.c5 = 'x';
- Equi Join
just INNER or OUTER JOIN with using only
=as join condition-- ANSI SQL style SELECT t1.c1, t2.c2 FROM t1 JOIN t2 ON t1.c1 = t2.c1; -- older SQL-89 style, considered deprecated in most databases SELECT t1.c1, t2.c2 FROM t1, t2 WHERE t1.c1 = t2.c1;
- Self Join
to get data from the same table
should use aliases to refer to the same table multiple times
SELECT t1.c1, t1.c2, t1.c3 FROM Table1 AS t1 INNER JOIN Table1 AS t2 ON t1.c1 = t2.c1 WHERE t2.c3 = 'x';
- Natural Join
selecting columns that are unique
only return one column each from the duplicates
SELECT * FROM t1 NATURAL JOIN t2;
- Outer Join
Left: also retrieve non-associated rows from left table
Right: also retrieve non-associated rows from right table
Full: also retrieve non-associated rows from both tables, same as left join and right join combined
has various form of syntax based on SQL implementations
cannot do in MySQL
-- also retrieve non-associated rows from t1 SELECT t1.c1, t2.c1 FROM t1 LEFT OUTER JOIN t2 ON t1.c3 = t2.c3; -- also retrieve non-associated rows from t2 SELECT t1.c1, t2.c1 FROM t1 RIGHT OUTER JOIN t2 ON t1.c3 = t2.c3; -- also retrieve non-associated rows from both t1 and t2 SELECT t1.c1, t2.c1 FROM t1 FULL OUTER JOIN t2 ON t1.c3 = t2.c3;
Union#
also called compound queries, combine columns from multiple
SELECTstatements as a single query resultused to get similarly structured data from different tables or multiple queries against a single table in a single query
combining two queries to the same table is the same as a single query with multiple
WHEREsometimes
UNIONquery can be more complicated than usingWHEREduplicate rows are auto removed, and must use
UNION ALLto include them
- Rules
must have two or more
SELECTmust have same columns, expressions, or aggregate functions
columns must be of same or compatible data type, different data types will be converted if possible
if different column names, the first name is used
can only use one
ORDER BY, and it must be after the finalSELECT
- Additional UNION Types
EXCEPTorMINUS: retrieve only rows that exist in the first table
INTERSECT: retrieve only rows that exist in both tablesSELECT c1 FROM t1 UNION SELECT c2 FROM t2; -- result c1 and c2 in single column SELECT c1 as new_name FROM t1 UNION SELECT c2 FROM t2; -- result column will be new_name
Triggers#
block of SQL that will be performed when certain action is operated
e.g add a row to t2 when certain data is deleted on t1
delete trigger:
DROP TRIGGER trig1;Create Trigger
DELIMITER $$ -- change SQL delimiter from ';' to '$$' CREATE TRIGGER trig1 BEFORE INSERT -- can also use AFTER INSERT/DELETE/UPDATE ON t1 FOR EACH ROW BEGIN INSERT INTO t2 VALUES('action triggered'); INSERT INTO t2 VALUES(NEW.c1); -- add value of c1, to be added to t1, into t2 first IF NEW.c1 = 'x' THEN INSERT INTO t2 VALUES(NEW.c1); ELSEIF NEW.c1 = 'y' THEN INSERT INTO t2 VALUES('wrong value'); ELSE INSERT INTO t2 VALUES('default value'); END IF; END$$ -- CREATE command ends here DELIMITER ; -- change SQL delimiter back to ';'
Stored Procedures#
collections of SQL statements to simplify complex operations
not supported by SQLite
syntax varies from one DBMS to another
can have optional parameters with default values, out-of-order parameters, update a parameter or return a value to the executing application
always comment when writing stored procedures
Oracle Syntax#
code is within
BEGINandEND
IN: pass parameter to procedure
OUT: pass parameter from procedure
INOUT: pass parameters to and from the procedure-- Creating CREATE PROCEDURE MailingListCount ( ListCount OUT INTEGER -- pass a value from the procedure ) IS v_rows INTEGER; BEGIN SELECT COUNT(*) INTO v_rows FROM Customers WHERE NOT cust_email IS NULL; ListCount := v_rows; END; -- Executing var ReturnValue NUMBER -- variable to hold return value EXEC MailingListCount(:ReturnValue); -- execute the procedure SELECT ReturnValue; -- display the return value
MSSQL Syntax#
all local variable names start with
@-- Creating CREATE PROCEDURE MailingListCount -- no parameter AS DECLARE @cnt INTEGER -- local variable SELECT @cnt = COUNT(*) -- store into variable FROM Customers WHERE NOT cust_email IS NULL; RETURN @cnt -- Executing DECLARE @ReturnValue INT EXECUTE @ReturnValue=MailingListCount; SELECT @ReturnValue; -- Procedure to insert into table CREATE PROCEDURE NewOrder @cust_id CHAR(10) -- accept a parameter AS DECLARE @order_num INTEGER SELECT @order_num=MAX(order_num) FROM Orders SELECT @order_num=@order_num+1 INSERT INTO Orders(order_num, order_date, cust_id) VALUES(@order_num, GETDATE(), @cust_id) RETURN @order_num -- Different way to insert into table CREATE PROCEDURE NewOrder @cust_id CHAR(10) -- accept a parameter AS INSERT INTO Orders(cust_id) Values(@cust_id) SELECT order_num = @@IDENTITY; -- generated by DBMS, GETDATE() is used by default -- access @@IDENTITY to get generated order_num
ER Diagrams#
Entity Relationship Diagrams, helps to convert storage/business requirements into database schema
uses shapes and arrows or graphs to define relationship
Entity#
object to be modeled and stored information about
use rectangle shape
can define more than one entity in a diagram
- Weak Entity
cannot be uniquely identified only by its attributes
usually depends on other entities
use double-layered rectangle shape
always need to have total participation in the relationship
Attributes#
specific information about an entity
use oval shape
- Primary Key
uniquely define an entry in the database table
use oval shape, text is underlined
- Composite Attribute
can be broken into sub-attributes
e.g name can be separated into first_name and last_name
- Multi-Valued Attribute
can have more than one value
use double-layered oval shape
e.g a person can have more than one phone numbers
- Derived Attribute
can be derived from other attributes
use dotted-line oval shape
e.g a student’s attribute
passedcan be derived from attributetest_score
Relationship#
connect multiple entities in a single diagram
use diamond shape
connect with single line for partial participation and double line for total
- Relationship Attribute
a relationship can have attributes
attributes are not stored on entity but on the relationship
- Relationship Cardinality
maximum number of times an instance in one entity can relate to instances of another entity
e.g 1:1, 1:N, N:M
- Identifying Relationship
use double-layered diamond shape
serves to uniquely identify the weak entity
ER to Schema#
binary: two entity participating
- Create
create relation/table for each regular entity with all simple attributes
create relation/table for each weak entity with all simple attributes
primary key should be the key of weak entity plus the primary key of its owner
- Map 1:1
add primary key of one entity as foreign key in the one that has total participation
if both are partial or total, use most convenient approach
- Map 1:N
add
1side primary key as foreign key on theNside
- Map M:N
create new relation/table, with primary key as the combination of both entities’ primary keys, and include any relationship attributes
References & External Resources#
Forta, B. (2020). Sams Teach Yourself SQL in 10 Minutes, Fifth Edition. Hoboken, NJ: Sams.