Ruby 🆚 SQL — Interactive Cheatsheet for Rubyists

Basics

Run a simple expression

puts 2026 - 1995

SELECT 2026 - 1995 AS years_ago;

SQL can evaluate expressions without referencing any table. The AS keyword gives the result column a name. In Ruby you compute expressions directly.

Arithmetic and math functions

puts (-7).abs
puts 3.14159.round(2)
puts Math.sqrt(144)

SELECT abs(-7), round(3.14159, 2), sqrt(144);

SQL's built-in math functions — abs, round, sqrt, ceil, floor, pow, log — map closely to Ruby's Numeric methods and the Math module.

String functions

puts 'hello'.upcase
puts '  hello  '.strip
puts 'hello world'.length

SELECT upper('hello'), trim(' hello '), length('hello world');

SQLite string functions (upper, lower, trim, length, substr, replace, instr) correspond directly to Ruby's String methods. SQL uses positional arguments; Ruby uses methods on the receiver.

String concatenation

puts 'Hello, ' + 'World!'
puts "Alice" + " " + "Smith"

SELECT 'Hello, ' || 'World!' AS greeting,
       'Alice' || ' ' || 'Smith' AS full_name;

SQL uses || for string concatenation; Ruby uses + or string interpolation. The || operator is defined by the SQL standard; some databases use CONCAT() instead, but SQLite uses ||.

Date and time

require 'date'
puts Date.today
puts (Date.today >> 1).to_s

SELECT date('now')            AS today,
       date('now', '+1 month') AS next_month;

SQLite stores dates as text in ISO 8601 format (YYYY-MM-DD). The date(), time(), and datetime() functions accept modifier strings like '+1 month', '-7 days', and 'start of year'. Ruby's Date and Time classes offer equivalent arithmetic.

Reading Data

Select all rows

employees = [
  { id: 1, name: 'Alice', department: 'Engineering', salary: 90_000 },
  { id: 2, name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { id: 3, name: 'Carol', department: 'Engineering', salary: 85_000 },
]
employees.each { |employee| puts employee.values.join(', ') }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Alice', 'Engineering', 90000),
  (2, 'Bob',   'Marketing',   75000),
  (3, 'Carol', 'Engineering', 85000);
SELECT * FROM employees;

Employee.all in ActiveRecord generates SELECT * FROM employees. The * wildcard selects every column; in production code, prefer naming columns explicitly to avoid surprises when the schema changes.

Select specific columns

employees = [
  { id: 1, name: 'Alice', department: 'Engineering', salary: 90_000 },
  { id: 2, name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { id: 3, name: 'Carol', department: 'Engineering', salary: 85_000 },
]
employees.each { |employee| puts "#{employee[:name]}: #{employee[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Alice', 'Engineering', 90000),
  (2, 'Bob',   'Marketing',   75000),
  (3, 'Carol', 'Engineering', 85000);
SELECT name, salary FROM employees;

Employee.select(:name, :salary) in ActiveRecord. Selecting only the columns you need reduces memory usage and network transfer — especially important when tables have many columns or large text/blob fields.

Computed column with alias

employees = [
  { name: 'Alice', salary: 90_000 },
  { name: 'Bob',   salary: 75_000 },
  { name: 'Carol', salary: 85_000 },
]
employees.each do |employee|
  puts "#{employee[:name]}: #{(employee[:salary] / 12.0).round(2)}"
end

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES (1, 'Alice', 90000), (2, 'Bob', 75000), (3, 'Carol', 85000);
SELECT name, salary / 12.0 AS monthly_salary FROM employees;

Employee.select("name, salary / 12.0 AS monthly_salary") in ActiveRecord. The AS keyword renames a column in the result set — it does not affect the underlying data. Any expression can be aliased: arithmetic, function calls, or CASE expressions.

Distinct values

employees = [
  { name: 'Alice', department: 'Engineering' },
  { name: 'Bob',   department: 'Marketing'   },
  { name: 'Carol', department: 'Engineering' },
]
puts employees.map { |e| e[:department] }.uniq

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT);
INSERT INTO employees VALUES
  (1, 'Alice', 'Engineering'),
  (2, 'Bob',   'Marketing'),
  (3, 'Carol', 'Engineering');
SELECT DISTINCT department FROM employees;

Employee.distinct.pluck(:department) in ActiveRecord. DISTINCT removes duplicate rows from the result. It applies to the entire row, not just one column — SELECT DISTINCT first_name, last_name returns unique (first, last) pairs.

Filtering

Filter by exact value

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
]
engineers = employees.select { |e| e[:department] == 'Engineering' }
engineers.each { |e| puts "#{e[:name]}: #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Alice', 'Engineering', 90000),
  (2, 'Bob',   'Marketing',   75000),
  (3, 'Carol', 'Engineering', 85000);
SELECT name, salary FROM employees WHERE department = 'Engineering';

Employee.where(department: 'Engineering') in ActiveRecord. The WHERE clause filters rows before they are returned. String literals in SQL use single quotes; double quotes are for identifiers (table and column names).

Filter by comparison

employees = [
  { name: 'Alice', salary: 90_000 },
  { name: 'Bob',   salary: 75_000 },
  { name: 'Carol', salary: 85_000 },
  { name: 'Dave',  salary: 70_000 },
]
puts employees.select { |e| e[:salary] > 80_000 }.map { |e| "#{e[:name]}: #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Alice', 90000), (2, 'Bob', 75000),
  (3, 'Carol', 85000), (4, 'Dave', 70000);
SELECT name, salary FROM employees WHERE salary > 80000;

Employee.where('salary > ?', 80_000) in ActiveRecord. SQL comparison operators — >, <, >=, <=, =, != (or <>) — work on numbers, dates, and strings.

Multiple conditions: AND / OR

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 92_000 },
]
result = employees.select { |e| e[:department] == 'Engineering' || e[:salary] > 90_000 }
result.each { |e| puts "#{e[:name]} (#{e[:department]}): #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Alice', 'Engineering', 90000), (2, 'Bob',  'Marketing', 75000),
  (3, 'Carol', 'Engineering', 85000), (4, 'Dave', 'Marketing', 92000);
SELECT name, department, salary
FROM employees
WHERE department = 'Engineering' OR salary > 90000;

Employee.where(department: 'Engineering').or(Employee.where('salary > ?', 90_000)) in ActiveRecord. Use parentheses to control precedence when mixing AND and OR: (A OR B) AND C behaves differently from A OR (B AND C).

Filter against a list: IN

employees = [
  { name: 'Alice', department: 'Engineering' },
  { name: 'Bob',   department: 'Marketing'   },
  { name: 'Carol', department: 'Finance'     },
  { name: 'Dave',  department: 'Engineering' },
]
target_departments = ['Engineering', 'Finance']
result = employees.select { |e| target_departments.include?(e[:department]) }
result.each { |e| puts "#{e[:name]}: #{e[:department]}" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT);
INSERT INTO employees VALUES
  (1, 'Alice', 'Engineering'), (2, 'Bob', 'Marketing'),
  (3, 'Carol', 'Finance'),     (4, 'Dave', 'Engineering');
SELECT name, department FROM employees
WHERE department IN ('Engineering', 'Finance');

Employee.where(department: ['Engineering', 'Finance']) in ActiveRecord. IN is equivalent to a series of OR equality checks. NOT IN excludes those values — but watch out: NOT IN (list) returns no rows if NULL is in the list.

Filter by range: BETWEEN

employees = [
  { name: 'Alice', salary: 90_000 },
  { name: 'Bob',   salary: 75_000 },
  { name: 'Carol', salary: 85_000 },
  { name: 'Dave',  salary: 70_000 },
]
result = employees.select { |e| (75_000..87_000).cover?(e[:salary]) }
result.each { |e| puts "#{e[:name]}: #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Alice', 90000), (2, 'Bob', 75000),
  (3, 'Carol', 85000), (4, 'Dave', 70000);
SELECT name, salary FROM employees WHERE salary BETWEEN 75000 AND 87000;

Employee.where(salary: 75_000..87_000) in ActiveRecord. BETWEEN low AND high is inclusive on both ends — equivalent to salary >= 75000 AND salary <= 87000. Ruby's Range#cover? is also inclusive by default for numeric ranges.

Pattern matching: LIKE

employees = [
  { name: 'Alice Smith',  email: 'alice@example.com'   },
  { name: 'Bob Jones',    email: 'bob@example.com'     },
  { name: 'Alice Brown',  email: 'alice.b@example.com' },
]
result = employees.select { |e| e[:name].start_with?('Alice') }
result.each { |e| puts "#{e[:name]} — #{e[:email]}" }

CREATE TABLE employees (id INTEGER, name TEXT, email TEXT);
INSERT INTO employees VALUES
  (1, 'Alice Smith', 'alice@example.com'),
  (2, 'Bob Jones',   'bob@example.com'),
  (3, 'Alice Brown', 'alice.b@example.com');
SELECT name, email FROM employees WHERE name LIKE 'Alice%';

Employee.where("name LIKE ?", "Alice%") in ActiveRecord. In LIKE patterns, % matches any sequence of characters and _ matches exactly one character. Use ILIKE (or LIKE with LOWER()) for case-insensitive matching — SQLite's LIKE is case-insensitive for ASCII letters by default.

NULL checks: IS NULL / IS NOT NULL

employees = [
  { name: 'Alice', manager_id: nil },
  { name: 'Bob',   manager_id: 1   },
  { name: 'Carol', manager_id: 1   },
  { name: 'Dave',  manager_id: nil },
]
top_level = employees.select { |e| e[:manager_id].nil? }
top_level.each { |e| puts e[:name] }

CREATE TABLE employees (id INTEGER, name TEXT, manager_id INTEGER);
INSERT INTO employees VALUES
  (1, 'Alice', NULL), (2, 'Bob', 1),
  (3, 'Carol', 1),    (4, 'Dave', NULL);
SELECT name FROM employees WHERE manager_id IS NULL;

Employee.where(manager_id: nil) in ActiveRecord. In SQL, NULL represents an unknown or absent value. Never use = NULL or != NULL — they always evaluate to UNKNOWN, not TRUE. Always use IS NULL or IS NOT NULL. In Ruby, nil behaves like a normal falsy value — there is no such gotcha.

Sorting & Limiting

Sort results: ORDER BY

employees = [
  { name: 'Alice', salary: 90_000 },
  { name: 'Bob',   salary: 75_000 },
  { name: 'Carol', salary: 85_000 },
  { name: 'Dave',  salary: 92_000 },
]
sorted = employees.sort_by { |e| -e[:salary] }
sorted.each { |e| puts "#{e[:name]}: #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Alice', 90000), (2, 'Bob', 75000),
  (3, 'Carol', 85000), (4, 'Dave', 92000);
SELECT name, salary FROM employees ORDER BY salary DESC;

Employee.order(salary: :desc) in ActiveRecord. Without ORDER BY, the row order returned by a SQL query is undefined — do not rely on insertion order. ASC (ascending) is the default; DESC must be stated explicitly.

Sort by multiple columns

employees = [
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 80_000 },
]
sorted = employees.sort_by { |e| [e[:department], -e[:salary]] }
sorted.each { |e| puts "#{e[:department]}: #{e[:name]} (#{e[:salary]})" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Carol', 'Engineering', 85000), (2, 'Alice', 'Engineering', 90000),
  (3, 'Bob',   'Marketing',   75000), (4, 'Dave',  'Marketing',   80000);
SELECT name, department, salary
FROM employees
ORDER BY department ASC, salary DESC;

Employee.order(department: :asc, salary: :desc) in ActiveRecord. Multiple ORDER BY columns are evaluated left to right — rows are first sorted by department, then within each department by salary descending.

Limit results: LIMIT

employees = [
  { name: 'Dave',  salary: 92_000 },
  { name: 'Alice', salary: 90_000 },
  { name: 'Carol', salary: 85_000 },
  { name: 'Bob',   salary: 75_000 },
]
top_three = employees.sort_by { |e| -e[:salary] }.first(3)
top_three.each { |e| puts "#{e[:name]}: #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES
  (1, 'Dave', 92000), (2, 'Alice', 90000),
  (3, 'Carol', 85000), (4, 'Bob', 75000);
SELECT name, salary FROM employees ORDER BY salary DESC LIMIT 3;

Employee.order(salary: :desc).limit(3) in ActiveRecord. LIMIT should almost always be used with ORDER BY — without ordering, which rows you get is undefined and can vary between queries.

Pagination with OFFSET

employees = [
  { name: 'Dave',  salary: 92_000 },
  { name: 'Alice', salary: 90_000 },
  { name: 'Carol', salary: 85_000 },
  { name: 'Eve',   salary: 82_000 },
  { name: 'Bob',   salary: 75_000 },
  { name: 'Frank', salary: 71_000 },
]
page_size   = 2
page_number = 2  # 1-based
page = employees.sort_by { |e| -e[:salary] }
                .drop((page_number - 1) * page_size)
                .first(page_size)
page.each { |e| puts "#{e[:name]}: #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Dave',92000),(2,'Alice',90000),(3,'Carol',85000),
  (4,'Eve',82000), (5,'Bob',75000), (6,'Frank',71000);
SELECT name, salary FROM employees ORDER BY salary DESC LIMIT 2 OFFSET 2;

Employee.order(salary: :desc).limit(2).offset(2) in ActiveRecord. OFFSET n skips the first n rows. This "skip-limit" pagination works well for small datasets but becomes slow on large tables because the database still scans all skipped rows. For large datasets, keyset pagination (WHERE id > last_seen_id) is more efficient.

Aggregates

Count rows

employees = [
  { name: 'Alice', department: 'Engineering' },
  { name: 'Bob',   department: 'Marketing'   },
  { name: 'Carol', department: 'Engineering' },
]
puts "Total: #{employees.count}"
puts "Unique departments: #{employees.map { |e| e[:department] }.uniq.count}"

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT);
INSERT INTO employees VALUES
  (1, 'Alice', 'Engineering'),
  (2, 'Bob',   'Marketing'),
  (3, 'Carol', 'Engineering');
SELECT count(*) AS total, count(DISTINCT department) AS unique_departments FROM employees;

Employee.count and Employee.distinct.count(:department) in ActiveRecord. count(*) counts all rows including those with NULL values. count(column) counts only non-NULL values in that column.

Sum and average

salaries = [90_000, 75_000, 85_000, 70_000]
puts "Total payroll:   #{salaries.sum}"
puts "Average salary:  #{(salaries.sum.to_f / salaries.count).round(2)}"

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice',90000),(2,'Bob',75000),(3,'Carol',85000),(4,'Dave',70000);
SELECT sum(salary) AS total_payroll, avg(salary) AS average_salary FROM employees;

Employee.sum(:salary) and Employee.average(:salary) in ActiveRecord. avg() ignores NULL values — a row with NULL salary is not counted. This differs from dividing sum by count(*), which would treat missing salaries as zero.

Minimum and maximum

employees = [
  { name: 'Alice', salary: 90_000, hire_date: '2020-03-15' },
  { name: 'Bob',   salary: 75_000, hire_date: '2019-07-01' },
  { name: 'Carol', salary: 85_000, hire_date: '2021-11-20' },
]
puts "Lowest salary:   #{employees.min_by { |e| e[:salary] }[:salary]}"
puts "Highest salary:  #{employees.max_by { |e| e[:salary] }[:salary]}"
puts "Earliest hire:   #{employees.min_by { |e| e[:hire_date] }[:hire_date]}"

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL, hire_date TEXT);
INSERT INTO employees VALUES
  (1,'Alice',90000,'2020-03-15'),
  (2,'Bob',  75000,'2019-07-01'),
  (3,'Carol',85000,'2021-11-20');
SELECT min(salary)    AS lowest,
       max(salary)    AS highest,
       min(hire_date) AS earliest_hire
FROM employees;

Employee.minimum(:salary) and Employee.maximum(:salary) in ActiveRecord. min() and max() work on strings and dates too — SQLite compares date strings lexicographically, which works correctly for ISO 8601 format (YYYY-MM-DD).

Group rows: GROUP BY

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 80_000 },
]
groups = employees.group_by { |e| e[:department] }
groups.each do |department, members|
  avg = members.sum { |e| e[:salary] } / members.count.to_f
  puts "#{department}: #{members.count} employees, avg $#{avg.round}"
end

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice','Engineering',90000),(2,'Bob','Marketing',75000),
  (3,'Carol','Engineering',85000),(4,'Dave','Marketing',80000);
SELECT department,
       count(*)        AS headcount,
       avg(salary)     AS avg_salary
FROM employees
GROUP BY department;

Employee.group(:department).count and Employee.group(:department).average(:salary) in ActiveRecord. Every column in the SELECT list must either be in the GROUP BY clause or wrapped in an aggregate function — this rule trips up many SQL newcomers.

Filter groups: HAVING

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 80_000 },
  { name: 'Eve',   department: 'Finance',     salary: 95_000 },
]
groups = employees.group_by { |e| e[:department] }
groups.select { |_, members| members.count > 1 }.each do |dept, members|
  avg = members.sum { |e| e[:salary] } / members.count.to_f
  puts "#{dept}: #{members.count} employees, avg $#{avg.round}"
end

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice','Engineering',90000),(2,'Bob','Marketing',75000),
  (3,'Carol','Engineering',85000),(4,'Dave','Marketing',80000),
  (5,'Eve',  'Finance',    95000);
SELECT department, count(*) AS headcount, avg(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING count(*) > 1;

Employee.group(:department).having("count(*) > 1").count in ActiveRecord. HAVING filters after grouping — it is for aggregate conditions what WHERE is for row conditions. You cannot use WHERE to filter on count(*) because WHERE runs before grouping.

Conditional aggregation with CASE

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 80_000 },
]
puts "Total:               #{employees.count}"
puts "High earners (>80k): #{employees.count { |e| e[:salary] > 80_000 }}"
puts "Engineering payroll: #{employees.select { |e| e[:department] == 'Engineering' }.sum { |e| e[:salary] }}"

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice','Engineering',90000),(2,'Bob','Marketing',75000),
  (3,'Carol','Engineering',85000),(4,'Dave','Marketing',80000);
SELECT
  count(*)                                              AS total,
  count(CASE WHEN salary > 80000 THEN 1 END)           AS high_earners,
  sum(CASE WHEN department = 'Engineering' THEN salary
           ELSE 0 END)                                 AS engineering_payroll
FROM employees;

A CASE WHEN ... THEN ... END expression inside an aggregate function is a powerful pattern — it counts or sums only rows that satisfy a condition, all in a single pass over the data. This is sometimes called "pivot aggregation" and has no direct ActiveRecord equivalent; it requires raw SQL or Arel.

Joins

INNER JOIN

departments = [{ id: 1, name: 'Engineering' }, { id: 2, name: 'Marketing' }]
employees   = [
  { name: 'Alice', department_id: 1, salary: 90_000 },
  { name: 'Bob',   department_id: 2, salary: 75_000 },
  { name: 'Carol', department_id: 1, salary: 85_000 },
]
result = employees.filter_map do |employee|
  dept = departments.find { |d| d[:id] == employee[:department_id] }
  "#{employee[:name]}: #{dept[:name]} — #{employee[:salary]}" if dept
end
puts result

CREATE TABLE departments (id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE employees (id INTEGER, name TEXT, department_id INTEGER, salary REAL);
INSERT INTO departments VALUES (1,'Engineering'),(2,'Marketing'),(3,'Finance');
INSERT INTO employees   VALUES (1,'Alice',1,90000),(2,'Bob',2,75000),(3,'Carol',1,85000);
SELECT employees.name, departments.name AS department, employees.salary
FROM employees
INNER JOIN departments ON employees.department_id = departments.id;

Employee.joins(:department).select('employees.*, departments.name AS dept_name') in ActiveRecord. An INNER JOIN returns only rows where the join condition matches in both tables — employees with no matching department (or with a NULL department_id) are excluded from the result.

LEFT JOIN (include unmatched rows)

departments = [{ id: 1, name: 'Engineering' }, { id: 2, name: 'Marketing' }, { id: 3, name: 'Finance' }]
employees   = [
  { name: 'Alice', department_id: 1 },
  { name: 'Bob',   department_id: 2 },
  { name: 'Carol', department_id: 1 },
  { name: 'Dave',  department_id: nil },
]
result = employees.map do |employee|
  dept = departments.find { |d| d[:id] == employee[:department_id] }
  "#{employee[:name]}: #{dept ? dept[:name] : 'NULL'}"
end
puts result

CREATE TABLE departments (id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE employees (id INTEGER, name TEXT, department_id INTEGER);
INSERT INTO departments VALUES (1,'Engineering'),(2,'Marketing'),(3,'Finance');
INSERT INTO employees   VALUES (1,'Alice',1),(2,'Bob',2),(3,'Carol',1),(4,'Dave',NULL);
SELECT employees.name, departments.name AS department
FROM employees
LEFT JOIN departments ON employees.department_id = departments.id;

Employee.left_outer_joins(:department) in ActiveRecord. A LEFT JOIN keeps all rows from the left table even when there is no match in the right table — the unmatched right-side columns are NULL. Use a LEFT JOIN combined with WHERE right.id IS NULL to find records that lack an association.

JOIN with aggregation

departments = [{ id: 1, name: 'Engineering' }, { id: 2, name: 'Marketing' }]
employees   = [
  { name: 'Alice', department_id: 1, salary: 90_000 },
  { name: 'Bob',   department_id: 2, salary: 75_000 },
  { name: 'Carol', department_id: 1, salary: 85_000 },
  { name: 'Dave',  department_id: 2, salary: 80_000 },
]
departments.each do |department|
  members = employees.select { |e| e[:department_id] == department[:id] }
  avg = members.empty? ? 0 : members.sum { |e| e[:salary] } / members.count.to_f
  puts "#{department[:name]}: #{members.count} employees, avg $#{avg.round}"
end

CREATE TABLE departments (id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE employees (id INTEGER, name TEXT, department_id INTEGER, salary REAL);
INSERT INTO departments VALUES (1,'Engineering'),(2,'Marketing');
INSERT INTO employees   VALUES (1,'Alice',1,90000),(2,'Bob',2,75000),(3,'Carol',1,85000),(4,'Dave',2,80000);
SELECT departments.name,
       count(employees.id) AS headcount,
       avg(employees.salary) AS avg_salary
FROM departments
LEFT JOIN employees ON employees.department_id = departments.id
GROUP BY departments.id, departments.name;

Joining and then grouping is one of the most common SQL patterns. Using a LEFT JOIN here ensures that departments with no employees still appear in the results with a count of zero — an INNER JOIN would silently omit them.

Self join (hierarchical data)

employees = [
  { id: 1, name: 'Alice', manager_id: nil },
  { id: 2, name: 'Bob',   manager_id: 1   },
  { id: 3, name: 'Carol', manager_id: 1   },
  { id: 4, name: 'Dave',  manager_id: 2   },
]
employees.each do |employee|
  manager = employees.find { |e| e[:id] == employee[:manager_id] }
  puts "#{employee[:name]} — manager: #{manager ? manager[:name] : 'none'}"
end

CREATE TABLE employees (id INTEGER PRIMARY KEY, name TEXT, manager_id INTEGER);
INSERT INTO employees VALUES (1,'Alice',NULL),(2,'Bob',1),(3,'Carol',1),(4,'Dave',2);
SELECT workers.name  AS employee,
       managers.name AS manager
FROM employees AS workers
LEFT JOIN employees AS managers ON workers.manager_id = managers.id;

A self join joins a table to itself using table aliases. It is the standard way to query hierarchical data — org charts, category trees, threaded comments — stored as an adjacency list in a single table. In ActiveRecord: belongs_to :manager, class_name: 'Employee' and has_many :reports, class_name: 'Employee', foreign_key: :manager_id.

Writing Data

Insert a row: INSERT INTO

employees = []
employees << { id: 1, name: 'Alice', department: 'Engineering', salary: 90_000 }
employees << { name: 'Bob', department: 'Marketing', salary: 75_000 }
employees.each { |e| puts e.values.join(', ') }

CREATE TABLE employees (id INTEGER PRIMARY KEY, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES (1, 'Alice', 'Engineering', 90000);
INSERT INTO employees (name, department, salary) VALUES ('Bob', 'Marketing', 75000);
SELECT * FROM employees;

Employee.create(name: 'Bob', department: 'Marketing', salary: 75_000) in ActiveRecord. The column-list form of INSERT INTO table (col1, col2) VALUES (...) is preferred in production — it remains correct when columns are added or reordered later.

Insert multiple rows

catalog  = []
new_products = [
  { id: 1, name: 'Widget',    price: 9.99,  category: 'Hardware'    },
  { id: 2, name: 'Gadget',    price: 24.99, category: 'Electronics' },
  { id: 3, name: 'Doohickey', price: 4.99,  category: 'Hardware'    },
]
catalog.concat(new_products)
catalog.each { |product| puts "#{product[:name]}: $#{product[:price]}" }

CREATE TABLE products (id INTEGER PRIMARY KEY, name TEXT, price REAL, category TEXT);
INSERT INTO products VALUES
  (1, 'Widget',    9.99,  'Hardware'),
  (2, 'Gadget',    24.99, 'Electronics'),
  (3, 'Doohickey', 4.99,  'Hardware');
SELECT * FROM products;

Product.insert_all([...]) in ActiveRecord inserts multiple rows in a single SQL statement — far more efficient than calling .create in a loop. A multi-row INSERT sends one round-trip to the database instead of one per record.

Update rows: UPDATE … SET … WHERE

employees = [
  { id: 1, name: 'Alice', salary: 90_000 },
  { id: 2, name: 'Bob',   salary: 75_000 },
  { id: 3, name: 'Carol', salary: 85_000 },
]
employees.each do |employee|
  employee[:salary] = (employee[:salary] * 1.1).round if employee[:salary] < 85_000
end
employees.each { |e| puts "#{e[:name]}: #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES (1,'Alice',90000),(2,'Bob',75000),(3,'Carol',85000);
UPDATE employees SET salary = salary * 1.1 WHERE salary < 85000;
SELECT name, salary FROM employees;

Employee.where('salary < ?', 85_000).update_all('salary = salary * 1.1') in ActiveRecord. UPDATE ... SET ... WHERE modifies every matching row in one SQL statement without loading records into memory. Omitting the WHERE clause updates every row in the table.

Delete rows: DELETE FROM … WHERE

employees = [
  { id: 1, name: 'Alice', active: true  },
  { id: 2, name: 'Bob',   active: false },
  { id: 3, name: 'Carol', active: true  },
  { id: 4, name: 'Dave',  active: false },
]
employees.reject! { |employee| !employee[:active] }
employees.each { |employee| puts "#{employee[:id]}: #{employee[:name]}" }

CREATE TABLE employees (id INTEGER, name TEXT, active INTEGER);
INSERT INTO employees VALUES (1,'Alice',1),(2,'Bob',0),(3,'Carol',1),(4,'Dave',0);
DELETE FROM employees WHERE active = 0;
SELECT * FROM employees;

Employee.where(active: false).delete_all in ActiveRecord skips callbacks and deletes directly. Use destroy_all when before_destroy / after_destroy callbacks must run — but destroy_all loads every record into memory first, making it much slower on large tables.

Upsert: INSERT OR REPLACE

settings = { 'theme' => 'light', 'language' => 'en' }
updates  = { 'theme' => 'dark',  'timezone' => 'UTC' }
settings.merge!(updates)
settings.each { |key, value| puts "#{key}: #{value}" }

CREATE TABLE settings (key TEXT PRIMARY KEY, value TEXT);
INSERT INTO settings VALUES ('theme', 'light'), ('language', 'en');
INSERT OR REPLACE INTO settings VALUES ('theme', 'dark'), ('timezone', 'UTC');
SELECT * FROM settings;

Setting.upsert({ key: 'theme', value: 'dark' }, unique_by: :key) in ActiveRecord. SQLite's INSERT OR REPLACE deletes the conflicting row and inserts a new one (resetting auto-increment IDs and all columns). The SQL standard form, INSERT INTO ... ON CONFLICT(key) DO UPDATE SET value = excluded.value, updates only specific columns and is supported in SQLite 3.24+.

Schema

Create a table

class CreateEmployees < ActiveRecord::Migration[8.0]
  def change
    create_table :employees do |table|
      table.string  :name,       null: false
      table.string  :department, null: false
      table.decimal :salary,     null: false
      table.date    :hire_date,  null: false, default: -> { 'CURRENT_DATE' }
      table.boolean :active,     null: false, default: true
    end
  end
end

CREATE TABLE employees (
  id         INTEGER PRIMARY KEY AUTOINCREMENT,
  name       TEXT    NOT NULL,
  department TEXT    NOT NULL,
  salary     REAL    NOT NULL,
  hire_date  TEXT    NOT NULL DEFAULT (date('now')),
  active     INTEGER NOT NULL DEFAULT 1
);

Rails migrations abstract the SQL DDL — create_table generates the CREATE TABLE statement for whatever database is configured. Rails automatically adds the id column (INTEGER PRIMARY KEY AUTOINCREMENT) unless you specify id: false.

SQLite data types

# SQLite's five storage classes map directly to Ruby types:
# INTEGER → Ruby Integer
# REAL    → Ruby Float
# TEXT    → Ruby String
# BLOB    → Ruby String (binary encoding)
# NULL    → Ruby nil
puts [42, 3.14, "hello", nil].map(&:class)

CREATE TABLE type_examples (integer_col INTEGER, real_col REAL, text_col TEXT);
INSERT INTO type_examples VALUES (42, 3.14, 'hello');
SELECT typeof(integer_col), typeof(real_col), typeof(text_col) FROM type_examples;

SQLite uses dynamic "type affinity" — a column declared as TEXT can store an integer. SQLite 3.37+ supports STRICT tables that enforce declared types. Rails maps database types to Ruby types automatically (e.g., :decimal → BigDecimal, :boolean → true/false).

Constraints: NOT NULL, UNIQUE, CHECK, DEFAULT

class CreateUsers < ActiveRecord::Migration[8.0]
  def change
    create_table :users do |table|
      table.string  :email,    null: false, index: { unique: true }
      table.string  :username, null: false
      table.string  :role,     null: false, default: 'user'
      table.integer :score,    null: false, default: 0
    end
    # CHECK constraint requires raw SQL — no migration DSL equivalent:
    # execute "ALTER TABLE users ADD CHECK (score >= 0)"
  end
end

CREATE TABLE users (
  id       INTEGER PRIMARY KEY,
  email    TEXT NOT NULL UNIQUE,
  username TEXT NOT NULL,
  role     TEXT NOT NULL DEFAULT 'user',
  score    INTEGER DEFAULT 0 CHECK (score >= 0)
);

Database constraints are enforced regardless of which application (or which language) accesses the database. ActiveRecord model validations (like validates :score, numericality: { greater_than_or_equal_to: 0 }) are application-level only — a direct SQL INSERT bypasses them. Rely on database constraints for true data integrity.

Foreign key constraint

class Department < ApplicationRecord
  has_many :employees, dependent: :restrict_with_exception
end

class Employee < ApplicationRecord
  belongs_to :department
end

# Migration:
# add_foreign_key :employees, :departments

CREATE TABLE departments (
  id   INTEGER PRIMARY KEY,
  name TEXT    NOT NULL
);
CREATE TABLE employees (
  id            INTEGER PRIMARY KEY,
  name          TEXT    NOT NULL,
  department_id INTEGER NOT NULL REFERENCES departments(id)
);

In SQLite, foreign key enforcement is disabled by default and must be enabled with PRAGMA foreign_keys = ON at the start of each connection. Rails 5+ sets this pragma automatically via the SQLite3 adapter. Without it, the REFERENCES declaration is parsed but not enforced.

Modify schema: ALTER TABLE

class AddFieldsToEmployees < ActiveRecord::Migration[8.0]
  def change
    add_column    :employees, :salary,        :decimal
    rename_column :employees, :salary,        :annual_salary
    change_column :employees, :annual_salary, :decimal, null: false, default: 0
    remove_column :employees, :legacy_field,  :string
  end
end

CREATE TABLE employees (id INTEGER PRIMARY KEY, name TEXT);
ALTER TABLE employees ADD COLUMN salary REAL DEFAULT 0;
ALTER TABLE employees RENAME COLUMN salary TO annual_salary;
-- SQLite 3.35+ only:
-- ALTER TABLE employees DROP COLUMN legacy_field;

SQLite's ALTER TABLE is more limited than other databases: before SQLite 3.35.0 you could not drop columns, and changing a column's type or constraints still requires recreating the table. Rails migrations handle this complexity automatically — the migration DSL generates the correct SQL for each database adapter, including table rebuilds when necessary.

Indexes

Create an index

class AddIndexesToEmployees < ActiveRecord::Migration[8.0]
  def change
    add_index :employees, :department
    add_index :employees, :salary
    # Composite index:
    add_index :employees, [:department, :salary]
  end
end

CREATE TABLE employees (id INTEGER PRIMARY KEY, name TEXT, department TEXT, salary REAL);
CREATE INDEX idx_employees_department ON employees (department);
CREATE INDEX idx_employees_salary     ON employees (salary);
-- Composite index (covers WHERE department = ? ORDER BY salary):
CREATE INDEX idx_employees_dept_salary ON employees (department, salary);

Indexes dramatically speed up queries that filter, sort, or join on the indexed column. The trade-off is slightly slower writes and more storage. Index every foreign key column and any column that frequently appears in WHERE or ORDER BY clauses. A composite index (a, b) can also serve queries on a alone but not queries on b alone.

Unique index

class AddUniqueIndexToUsersEmail < ActiveRecord::Migration[8.0]
  def change
    add_index :users, :email, unique: true
    # Partial unique index (unique among active users only):
    # execute "CREATE UNIQUE INDEX idx_users_email_active ON users (email) WHERE active = 1"
  end
end

CREATE TABLE users (id INTEGER PRIMARY KEY, email TEXT, active INTEGER DEFAULT 1);
CREATE UNIQUE INDEX idx_users_email ON users (email);
-- Partial unique index — unique only among active users:
CREATE UNIQUE INDEX idx_users_email_active ON users (email) WHERE active = 1;

A unique index is both a performance index and a data integrity constraint. ActiveRecord's validates :email, uniqueness: true checks uniqueness at the application level, but it has a race condition in concurrent environments. The database-level unique index is the only reliable guarantee — it is enforced atomically by the database engine.

Query analysis: EXPLAIN QUERY PLAN

# In Rails: Employee.where(department: 'Engineering').explain
# Output (with index):
#   QUERY PLAN
#   `--SEARCH employees USING INDEX idx_employees_department (department=?)
# Output (without index):
#   QUERY PLAN
#   `--SCAN employees
puts "Run Employee.where(...).explain in the Rails console to see the query plan"

CREATE TABLE employees (id INTEGER PRIMARY KEY, name TEXT, department TEXT);
CREATE INDEX idx_employees_department ON employees (department);
EXPLAIN QUERY PLAN
  SELECT * FROM employees WHERE department = 'Engineering';

EXPLAIN QUERY PLAN shows whether SQLite uses an index (SEARCH ... USING INDEX) or performs a full table scan (SCAN employees). A full scan on a large table with a selective WHERE clause usually indicates a missing index. ActiveRecord's .explain method runs the equivalent and formats the output.

Transactions

Atomic batch: BEGIN / COMMIT

accounts = { 'Alice' => 1000.0, 'Bob' => 500.0 }
begin
  transfer = 200
  accounts['Alice'] -= transfer
  accounts['Bob']   += transfer
  puts "Transfer committed"
rescue => error
  puts "Transfer failed: #{error.message}"
end
accounts.each { |owner, balance| puts "#{owner}: #{balance}" }

CREATE TABLE accounts (id INTEGER PRIMARY KEY, owner TEXT, balance REAL);
INSERT INTO accounts VALUES (1,'Alice',1000),(2,'Bob',500);
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 200 WHERE owner = 'Alice';
  UPDATE accounts SET balance = balance + 200 WHERE owner = 'Bob';
COMMIT;
SELECT owner, balance FROM accounts;

ActiveRecord::Base.transaction { ... } wraps the block in BEGIN/COMMIT. Any exception raised inside the block automatically triggers ROLLBACK. Transactions guarantee atomicity — either all operations succeed together or none of them take effect. Without a transaction, a server crash between the two UPDATEs would leave the accounts in an inconsistent state.

Undo a transaction: ROLLBACK

accounts  = { 'Alice' => 1000.0, 'Bob' => 500.0 }
original  = accounts.dup
begin
  accounts['Alice'] -= 1500  # more than Alice has
  raise "Insufficient funds" if accounts['Alice'] < 0
  accounts['Bob'] += 1500
  puts "Transfer committed"
rescue => error
  accounts.replace(original)
  puts "Rolled back: #{error.message}"
end
accounts.each { |owner, balance| puts "#{owner}: #{balance}" }

CREATE TABLE accounts (id INTEGER PRIMARY KEY, owner TEXT, balance REAL);
INSERT INTO accounts VALUES (1,'Alice',1000),(2,'Bob',500);
BEGIN TRANSACTION;
  UPDATE accounts SET balance = balance - 1500 WHERE owner = 'Alice';
ROLLBACK;
SELECT owner, balance FROM accounts;

Inside a Rails transaction block, raise ActiveRecord::Rollback to cancel the transaction without propagating the exception to the caller. Any other exception both rolls back the transaction and propagates upward. After a ROLLBACK, the database is exactly as it was before the BEGIN.

Partial rollback: SAVEPOINT

log_entries = ['Starting import']
checkpoint  = log_entries.dup
log_entries << 'Risky step executed'
# Roll back to checkpoint — discard the risky step
log_entries.replace(checkpoint)
log_entries << 'Risky step skipped — safe path taken'
log_entries.each_with_index { |entry, index| puts "#{index + 1}: #{entry}" }

CREATE TABLE log_entries (id INTEGER PRIMARY KEY AUTOINCREMENT, message TEXT);
INSERT INTO log_entries (message) VALUES ('Starting import');
SAVEPOINT before_risky_step;
  INSERT INTO log_entries (message) VALUES ('Risky step executed');
ROLLBACK TO SAVEPOINT before_risky_step;
  INSERT INTO log_entries (message) VALUES ('Risky step skipped — safe path taken');
RELEASE SAVEPOINT before_risky_step;
SELECT id, message FROM log_entries;

Nested ActiveRecord::Base.transaction { ... } blocks use savepoints automatically. Raising ActiveRecord::Rollback inside a nested transaction rolls back only that nested scope without affecting the outer transaction. RELEASE SAVEPOINT (like committing a nested transaction) merges the savepoint's work into the surrounding transaction.

Subqueries

Scalar subquery (inline value)

employees = [
  { name: 'Alice', salary: 90_000 },
  { name: 'Bob',   salary: 75_000 },
  { name: 'Carol', salary: 85_000 },
  { name: 'Dave',  salary: 70_000 },
]
average = employees.sum { |e| e[:salary] } / employees.count.to_f
result  = employees.map { |e| { **e, above_avg: (e[:salary] - average).round } }
result.sort_by { |e| -e[:above_avg] }.each do |e|
  puts "#{e[:name]}: #{e[:above_avg] >= 0 ? '+' : ''}#{e[:above_avg]}"
end

CREATE TABLE employees (id INTEGER, name TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice',90000),(2,'Bob',75000),(3,'Carol',85000),(4,'Dave',70000);
SELECT name, salary,
       round(salary - (SELECT avg(salary) FROM employees)) AS above_avg
FROM employees
ORDER BY above_avg DESC;

A scalar subquery is a SELECT that returns exactly one row and one column, used inline as a value expression. The database evaluates it once and substitutes the result. When a scalar subquery could return more than one row, the query fails at runtime with an error.

Subquery in IN clause

departments = [
  { id: 1, name: 'Engineering', profitable: true  },
  { id: 2, name: 'Marketing',   profitable: false },
  { id: 3, name: 'Sales',       profitable: true  },
]
employees = [
  { name: 'Alice', department_id: 1 }, { name: 'Bob',  department_id: 2 },
  { name: 'Carol', department_id: 1 }, { name: 'Dave', department_id: 3 },
]
profitable_ids = departments.select { |d| d[:profitable] }.map { |d| d[:id] }
puts employees.select { |e| profitable_ids.include?(e[:department_id]) }.map { |e| e[:name] }

CREATE TABLE departments (id INTEGER PRIMARY KEY, name TEXT, profitable INTEGER);
CREATE TABLE employees (id INTEGER, name TEXT, department_id INTEGER);
INSERT INTO departments VALUES (1,'Engineering',1),(2,'Marketing',0),(3,'Sales',1);
INSERT INTO employees   VALUES (1,'Alice',1),(2,'Bob',2),(3,'Carol',1),(4,'Dave',3);
SELECT name FROM employees
WHERE department_id IN (SELECT id FROM departments WHERE profitable = 1);

Employee.where(department_id: Department.where(profitable: true).select(:id)) in ActiveRecord composes the two queries into a single efficient SQL statement. The inner SELECT is executed by the database engine without loading intermediate results into Ruby memory.

EXISTS subquery

departments = [
  { id: 1, name: 'Engineering' },
  { id: 2, name: 'Marketing'   },
  { id: 3, name: 'Finance'     },
]
employees = [
  { name: 'Alice', department_id: 1 },
  { name: 'Bob',   department_id: 2 },
]
staffed_departments = departments.select { |dept| employees.any? { |e| e[:department_id] == dept[:id] } }
puts staffed_departments.map { |d| d[:name] }

CREATE TABLE departments (id INTEGER PRIMARY KEY, name TEXT);
CREATE TABLE employees (id INTEGER, name TEXT, department_id INTEGER);
INSERT INTO departments VALUES (1,'Engineering'),(2,'Marketing'),(3,'Finance');
INSERT INTO employees   VALUES (1,'Alice',1),(2,'Bob',2);
SELECT name FROM departments
WHERE EXISTS (
  SELECT 1 FROM employees WHERE employees.department_id = departments.id
);

EXISTS stops scanning as soon as it finds one matching row — it is often more efficient than IN for large subsets because it short-circuits. NOT EXISTS finds records that lack an association. The conventional SELECT 1 inside EXISTS is idiomatic; any expression works since only the existence of a row matters.

Correlated subquery

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 80_000 },
]
dept_max = employees.group_by { |e| e[:department] }
                    .transform_values { |members| members.max_by { |e| e[:salary] }[:salary] }
top_earners = employees.select { |e| e[:salary] == dept_max[e[:department]] }
top_earners.each { |e| puts "#{e[:name]} (#{e[:department]}): #{e[:salary]}" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice','Engineering',90000),(2,'Bob','Marketing',75000),
  (3,'Carol','Engineering',85000),(4,'Dave','Marketing',80000);
SELECT name, department, salary
FROM employees AS outer_row
WHERE salary = (
  SELECT max(salary) FROM employees AS inner_row
  WHERE inner_row.department = outer_row.department
);

A correlated subquery references a column from the outer query (outer_row.department here). It runs once for each outer row. For the "top earner per group" pattern, a window function — RANK() OVER (PARTITION BY department ORDER BY salary DESC) — is often cleaner and more efficient.

CTEs

Basic CTE: WITH … AS

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 80_000 },
  { name: 'Eve',   department: 'Engineering', salary: 95_000 },
]
dept_avgs  = employees.group_by { |e| e[:department] }
                      .transform_values { |m| m.sum { |e| e[:salary] } / m.count.to_f }
above_average = employees.select { |e| e[:salary] > dept_avgs[e[:department]] }
above_average.each { |e| puts "#{e[:name]}: #{e[:salary]} (dept avg: #{dept_avgs[e[:department]].round})" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice','Engineering',90000),(2,'Bob','Marketing',75000),
  (3,'Carol','Engineering',85000),(4,'Dave','Marketing',80000),(5,'Eve','Engineering',95000);
WITH department_averages AS (
  SELECT department, avg(salary) AS avg_salary
  FROM employees
  GROUP BY department
)
SELECT e.name, e.salary, round(da.avg_salary) AS dept_avg
FROM employees AS e
JOIN department_averages AS da USING (department)
WHERE e.salary > da.avg_salary;

A CTE (Common Table Expression) names a subquery so it can be referenced like a table elsewhere in the query. CTEs make complex queries readable by giving intermediate results meaningful names — like assigning a computed value to a variable before using it. In ActiveRecord, CTEs require raw SQL via .from(Arel.sql("WITH ... SELECT ...")) or the activerecord-cte gem.

Chained CTEs

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 80_000 },
  { name: 'Eve',   department: 'Finance',     salary: 95_000 },
]
totals = employees.group_by { |e| e[:department] }.map do |dept, members|
  { department: dept, headcount: members.count, avg_salary: members.sum { |e| e[:salary] } / members.count.to_f }
end
ranked = totals.sort_by { |t| -t[:avg_salary] }.each_with_index.map { |t, i| { **t, rank: i + 1 } }
ranked.each { |row| puts "#{row[:rank]}. #{row[:department]}: #{row[:headcount]} people, avg $#{row[:avg_salary].round}" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice','Engineering',90000),(2,'Bob','Marketing',75000),
  (3,'Carol','Engineering',85000),(4,'Dave','Marketing',80000),(5,'Eve','Finance',95000);
WITH
totals AS (
  SELECT department, count(*) AS headcount, avg(salary) AS avg_salary
  FROM employees GROUP BY department
),
ranked AS (
  SELECT *, rank() OVER (ORDER BY avg_salary DESC) AS rank
  FROM totals
)
SELECT rank, department, headcount, round(avg_salary) AS avg_salary
FROM ranked;

Multiple CTEs are separated by commas and evaluated in order. A later CTE can reference earlier ones by name. Think of each CTE as a named pipeline stage — totals computes department summaries, then ranked adds rank numbers. This "step-by-step assembly" style is far more readable than equivalent deeply nested subqueries.

Recursive CTE (tree traversal)

employees = [
  { id: 1, name: 'Alice', manager_id: nil },
  { id: 2, name: 'Bob',   manager_id: 1   },
  { id: 3, name: 'Carol', manager_id: 1   },
  { id: 4, name: 'Dave',  manager_id: 2   },
  { id: 5, name: 'Eve',   manager_id: 2   },
]
def build_org(employees, manager_id, depth, path_prefix)
  employees.select { |e| e[:manager_id] == manager_id }.flat_map do |employee|
    path = path_prefix.empty? ? employee[:name] : "#{path_prefix} → #{employee[:name]}"
    [{ name: employee[:name], depth: depth, path: path }] +
      build_org(employees, employee[:id], depth + 1, path)
  end
end
build_org(employees, nil, 0, '').sort_by { |r| r[:path] }
  .each { |r| puts "#{'  ' * r[:depth]}#{r[:name]}" }

CREATE TABLE employees (id INTEGER PRIMARY KEY, name TEXT, manager_id INTEGER);
INSERT INTO employees VALUES (1,'Alice',NULL),(2,'Bob',1),(3,'Carol',1),(4,'Dave',2),(5,'Eve',2);
WITH RECURSIVE org_chart AS (
  SELECT id, name, manager_id, 0 AS depth, name AS path
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, oc.depth + 1, oc.path || ' → ' || e.name
  FROM employees AS e
  JOIN org_chart  AS oc ON e.manager_id = oc.id
)
SELECT name, depth, path FROM org_chart ORDER BY path;

A recursive CTE has two parts joined by UNION ALL: the anchor (the base case — top-level rows where manager_id IS NULL) and the recursive member (which references the CTE itself to add one more level). This is the standard SQL way to walk tree structures stored as adjacency lists — org charts, category hierarchies, folder trees.

Window Functions

ROW_NUMBER: rank within a group

employees = [
  { name: 'Alice', department: 'Engineering', salary: 90_000 },
  { name: 'Bob',   department: 'Marketing',   salary: 75_000 },
  { name: 'Carol', department: 'Engineering', salary: 85_000 },
  { name: 'Dave',  department: 'Marketing',   salary: 80_000 },
  { name: 'Eve',   department: 'Engineering', salary: 95_000 },
]
result = employees.group_by { |e| e[:department] }.flat_map do |dept, members|
  members.sort_by { |e| -e[:salary] }.each_with_index.map { |employee, index|
    { **employee, rank_in_dept: index + 1 }
  }
end
result.sort_by { |e| [e[:department], e[:rank_in_dept]] }
      .each { |e| puts "#{e[:department]} ##{e[:rank_in_dept]}: #{e[:name]} (#{e[:salary]})" }

CREATE TABLE employees (id INTEGER, name TEXT, department TEXT, salary REAL);
INSERT INTO employees VALUES
  (1,'Alice','Engineering',90000),(2,'Bob','Marketing',75000),
  (3,'Carol','Engineering',85000),(4,'Dave','Marketing',80000),(5,'Eve','Engineering',95000);
SELECT name, department, salary,
       row_number() OVER (PARTITION BY department ORDER BY salary DESC) AS rank_in_dept
FROM employees;

Window functions compute a value for each row using a set of related rows — without collapsing them into groups the way GROUP BY does. PARTITION BY divides rows into independent windows (one per department here); ORDER BY inside the window function determines the ranking order. ActiveRecord has no direct equivalent; use .select(Arel.sql(...)) or the window_functions gem.

RANK and DENSE_RANK

scores = [
  { player: 'Alice', score: 95 }, { player: 'Bob',   score: 87 },
  { player: 'Carol', score: 95 }, { player: 'Dave',  score: 82 },
  { player: 'Eve',   score: 87 },
]
sorted = scores.sort_by { |s| -s[:score] }
sorted.each do |entry|
  rank       = sorted.index { |s| s[:score] == entry[:score] } + 1
  dense_rank = sorted.map { |s| s[:score] }.uniq.sort.reverse.index(entry[:score]) + 1
  puts "#{entry[:player]}: score #{entry[:score]}, rank #{rank}, dense_rank #{dense_rank}"
end

CREATE TABLE scores (player TEXT, score INTEGER);
INSERT INTO scores VALUES ('Alice',95),('Bob',87),('Carol',95),('Dave',82),('Eve',87);
SELECT player, score,
       rank()       OVER (ORDER BY score DESC) AS rank,
       dense_rank() OVER (ORDER BY score DESC) AS dense_rank
FROM scores;

RANK() skips numbers after ties — two players tied at rank 1 means the next player is rank 3. DENSE_RANK() never skips — the next distinct score gets the next consecutive rank. ROW_NUMBER() assigns unique sequential numbers with no ties, even among equal values.

Running total and moving average

sales = [
  { month: '2026-01', amount: 12_000 }, { month: '2026-02', amount: 15_000 },
  { month: '2026-03', amount:  9_500 }, { month: '2026-04', amount: 18_000 },
  { month: '2026-05', amount: 14_000 },
]
running_total = 0
sales.each_with_index do |sale, index|
  running_total += sale[:amount]
  window        = sales[[0, index - 2].max..index]
  moving_avg    = (window.sum { |s| s[:amount] } / window.count.to_f).round
  puts "#{sale[:month]}: $#{sale[:amount]} | running: $#{running_total} | 3-mo avg: $#{moving_avg}"
end

CREATE TABLE sales (id INTEGER, month TEXT, amount REAL);
INSERT INTO sales VALUES
  (1,'2026-01',12000),(2,'2026-02',15000),(3,'2026-03',9500),
  (4,'2026-04',18000),(5,'2026-05',14000);
SELECT month, amount,
  sum(amount) OVER (ORDER BY month)                                                AS running_total,
  round(avg(amount) OVER (ORDER BY month ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)) AS moving_avg_3mo
FROM sales;

SUM(amount) OVER (ORDER BY month) accumulates the sum row by row — a running total. The ROWS BETWEEN 2 PRECEDING AND CURRENT ROW frame limits the window to the current row and the two rows before it, producing a 3-month moving average. Expressing these patterns with ActiveRecord alone requires raw SQL — they are impractical with the query DSL.

LAG and LEAD: look backward/forward

sales = [
  { month: '2026-01', amount: 12_000 }, { month: '2026-02', amount: 15_000 },
  { month: '2026-03', amount:  9_500 }, { month: '2026-04', amount: 18_000 },
  { month: '2026-05', amount: 14_000 },
]
sales.each_with_index do |sale, index|
  prev       = index > 0 ? sales[index - 1] : nil
  pct_change = prev ? ((sale[:amount] - prev[:amount]) / prev[:amount].to_f * 100).round(1) : nil
  change_str = pct_change ? "#{pct_change >= 0 ? '+' : ''}#{pct_change}%" : 'n/a'
  puts "#{sale[:month]}: $#{sale[:amount]} | MoM change: #{change_str}"
end

CREATE TABLE sales (id INTEGER, month TEXT, amount REAL);
INSERT INTO sales VALUES
  (1,'2026-01',12000),(2,'2026-02',15000),(3,'2026-03',9500),
  (4,'2026-04',18000),(5,'2026-05',14000);
SELECT month, amount,
  lag(amount, 1)  OVER (ORDER BY month) AS prev_month,
  round(
    (amount - lag(amount,1) OVER (ORDER BY month))
    / lag(amount,1) OVER (ORDER BY month) * 100, 1
  ) AS pct_change
FROM sales;

LAG(column, n) accesses the value n rows back; LEAD(column, n) accesses n rows ahead. Both return NULL when the offset falls outside the window. Month-over-month percentage change is one of the most common analytics patterns — it requires a window function or a self join and has no clean ActiveRecord equivalent.