Def
Syntax: CASE, WHEN, THEN, ELSE structure
SELECT
col1,
col2,
CASE
WHEN condition1 THEN result1
WHEN condition2 THEN result2
ELSE resultN AS col3
FROM tbl;
start and end date, returning in terms of unitvalue to date, returning in terms of unit
str1 and str2 with str1 rendering firstLIKE Def
LIKE keyword when referencing string columns in conditionals:
SELECT * FROM example
WHERE txt_col LIKE sql_pattern
Exact Character Matching
SELECT * FROM example
WHERE txt_col LIKE 'hello'
-- searches example for rows that exactly equal 'hello' in their txt_col
% Operator
% operator matches “any character, any number of times”SELECT * FROM example
WHERE txt_col LIKE '%hello%'
-- searches example for rows that contain 'hello' in their txt_col
-- specifically checks for any number of chars, any num of times before 'hello' --> checks for the presence of 'hello' --> theb checks for any number of chars, any num of times after 'hello'
_ Operator
_ operator matches “any character, one time”SELECT * FROM example
WHERE txt_col LIKE 'J_hn'
-- searches example for rows that start with 'J' --> contain any single character --> end with 'hn' in their txt_col
-- so rows with 'John', 'Jehn', 'Juhn' in their txt_col would get returned
Forlooping Through Strings
SELECT SUBSTRING(str_or_col FROM start_idx FOR interval_length);
-- extracts a portion of a string starting from start_idx for interval_length number of characters
Regex Searching Through Strings
– eg, extracting rating from text description SELECT SUBSTRING(‘Rated PG-13 for content’ FROM ‘Rated ([\w-]+)’);
## Numerical Handling Methods
Operating on numerical cols together
- Helpful for numerical transformation (eg weighted averages)
```sql
SELECT AVERAGE(vals * weights) FROM tbl
GROUP BY col3
Why it’s a classic
CASE operator to only define the logic for the less numerous positive cases and leave the rest up to the ELSE statementSELECT
e.left_operand,
e.operator,
e.right_operand,
CASE -- use `CASE` keyword
WHEN e.operator = '>' AND v1.value > v2.value THEN 'true' -- don't try to directly pull operator
WHEN e.operator = '<' AND v1.value < v2.value THEN 'true' -- translate what it means in SQL
WHEN e.operator = '=' AND v1.value = v2.value THEN 'true' -- set the true conditions and else false all else
ELSE 'false' -- else false cuz there's more false cases than true ones
END AS value
FROM expressions e -- double join so we have
JOIN variables v1 ON e.left_operand = v1.name -- one col w/ left translated into nums
JOIN variables v2 ON e.right_operand = v2.name; -- one col w/ right translated into nums
Def
Syntax
Why it’s classic
MONTH(trans_date) since we ned to group by every month in every yearSELECT
-- we format YYYY-MM to avoid counting Jan 2024 and Jan 2025 as the same
DATE_FORMAT(trans_date, '%Y-%m') AS month,
country,
COUNT(*) AS trans_count,
-- using case statements wrapped in for conditional aggregation
SUM(CASE WHEN state = 'approved' THEN 1 ELSE 0 END) AS approved_count,
SUM(amount) AS trans_total_amount,
SUM(CASE WHEN state = 'approved' THEN amount ELSE 0 END) AS approved_total_amount
FROM Transactions
GROUP BY YEAR(trans_date), MONTH(trans_date), country;
Intuition
DISTINCT keywordWhy it’s classic
DISTINCT keyword in conjunctionSELECT c.customer_id FROM Customer as c
GROUP BY c.customer_id
HAVING COUNT(DISTINCT(c.product_key)) = (SELECT COUNT(DISTINCT(p.product_key)) FROM Product AS p)
Def
Syntax: EXISTS
shirts for rows with color being 'blue'.
shirts have color = 'blue' then the whole query returns FALSEshirts has color = 'blue' then the whole query returns TRUE
EXISTS (SELECT * FROM shirts WHERE color = 'blue')
Syntax: NOT EXISTS
houses for rows with size_sqft being at least 1000.
houses have size_sqft >= 1000 then the whole query returns TRUEhouses satisfies size_sqft >= 1000 then the whole query returns FALSE
NOT EXISTS (SELECT * FROM houses WHERE size_sqft >= 1000)
Def
SELECT-FROM-WHERE query inside an existing SELECT-FROM-WHERE SQL query to execute some logic.Use Cases
WHERE or HAVING clauses, useful for defining more complex conditionsEXISTS and NOT EXISTS strapped onto a subqeury that outputs the set A/B we’re interested in
SUM or COUNTIntuition
EXISTS or NOT EXISTSWhy it’s classic
EXISTS / NOT EXISTSSELECT c.customer_id, c.customer_name FROM customers AS c
WHERE
EXISTS (
SELECT * FROM orders AS o
WHERE o.customer_id = c.customer_id AND product_name = 'A'
)
AND
EXISTS (
SELECT * FROM orders AS o
WHERE o.customer_id = c.customer_id AND product_name = 'B'
)
AND
NOT EXISTS (
SELECT * FROM orders AS o
WHERE o.customer_id = c.customer_id AND product_name = 'C'
)
ORDER BY c.customer_name
Why it’s a classic
SELECT e1.name FROM Employee AS e1
WHERE (
SELECT COUNT(*) FROM Employee AS e2
WHERE e2.managerId = e1.id
) >= 5
Def
Syntax
CREATE VIEW then the view_name (in this case citation_stops)CREATE VIEW citation_stops AS (
SELECT gender, citation
FROM stops
WHERE citation = True
);
-- write new select statement to query on the view
SELECT * FROM citation_stops;
Def
Syntax
WITH statement defining and naming the CTESELECT statement after the WITH(<query>) statement to select the rows you want from your CTE
WITH temp_table_name AS (
SELECT col1, col2, FROM another_relation
WHERE condition
-- continue writing out any subqeury
)
SELECT col1, col2 FROM temp_table_name;
-- make sure you're not referencing cols not explicitly named in the query that defines the CTE
Def
Syntax
CREATE VIEW then the view_name (in this case citation_stops)CREATE MATERIALIZED VIEW citation_stops AS (
SELECT gender, citation
FROM stops
WHERE citation = True
);
-- write new select statement to query on the view
SELECT * FROM citation_stops;
Def
NULL values, etcCAST? Because with DDL we can change the base relation schema directly to avoid having recall CAST if we expect to be treating certain values a certain way repeatedly.Syntax: Creating Tables
CREATE TABLE cars (
id VARCHAR(20) PRIMARY KEY, --VARCHAR(n) means we allow alphanumeric characters making a string of length n
carName VARCHAR(100) NOT NULL,
factoryId VARCHAR(14) UNIQUE, --UNIQUE means each row in this table has a unique val under this col
price DECIMAL(10, 2), --DECIMAL(k, m) means this col can hold k digits with 2 of them being right of the decimal point
inStock BOOLEAN,
numSold INT,
popular BIT -- values are either 1 or 0, conversion to BIT means all non-zero values are converted to 1
);
Syntax: Dropping Tables
DROP TABLE erase_tbl; -- drops erase_tbl entirely from the database
DROP TABLE IF EXISTS erase_tbl; -- same as above but with error handling
Syntax: Dropping, Adding and Altering Columns
ALTER TABLE cars
ADD saleDate DATE,
DROP factoryId;
ALTER TABLE cars
ALTER COLUMN numSold TYPE FLOAT;
ALTER TABLE cars
MODIFY numSold FLOAT;
Def
Syntax: Adding Rows
CREATE TABLE cars (
id VARCHAR(20) PRIMARY KEY,
carName VARCHAR(100) NOT NULL,
factoryId VARCHAR(14) UNIQUE,
price DECIMAL(10, 2),
inStock BOOLEAN,
numSold INT,
popular BIT
);
INSERT INTO target_relation
VALUES
('AAa19AYha7', '2025 Honda Accord', 'io7186BH', 22925.60, TRUE, 190, 1),
('AAa18BHoau8s', '2026 Toyota Corolla Cross', 'jk71bah1s', 30952.57, TRUE, 171, 0);
-- we insert tuples where each value must correspond with the schema for the given target_relation or else we error
Syntax: Manipulating Rows
DELETE FROM target_relation;
DELETE FROM target_relation
WHERE <condition>;
UPDATE target_relation
SET col1 = col2
WHERE <condition>;