# SELECT
The SELECT statement is at the heart of most SQL queries. It defines what result set should be returned by the query, and is almost always used in conjunction with the FROM clause, which defines what part(s) of the database should be queried.
# Using the wildcard character to select all columns in a query.
Consider a database with the following two tables.
Employees table:
Id | FName | LName | DeptId |
---|---|---|---|
1 | James | Smith | 3 |
2 | John | Johnson | 4 |
Departments table:
Id | Name |
---|---|
1 | Sales |
2 | Marketing |
3 | Finance |
4 | IT |
# Simple select statement
*
is the wildcard character used to select all available columns in a table.
When used as a substitute for explicit column names, it returns all columns in all tables that a query is selecting FROM
. This effect applies to all tables the query accesses through its JOIN
clauses.
Consider the following query:
SELECT * FROM Employees
It will return all fields of all rows of the Employees
table:
Id | FName | LName | DeptId |
---|---|---|---|
1 | James | Smith | 3 |
2 | John | Johnson | 4 |
# Dot notation
To select all values from a specific table, the wildcard character can be applied to the table with dot notation.
Consider the following query:
SELECT
Employees.*,
Departments.Name
FROM
Employees
JOIN
Departments
ON Departments.Id = Employees.DeptId
This will return a data set with all fields on the Employee
table, followed by just the Name
field in the Departments
table:
|Id|FName|LName|DeptId|Name |---|---|---|--- |1|James|Smith|3|Finance |2|John|Johnson|4|IT
Warnings Against Use
It is generally advised that using *
is avoided in production code where possible, as it can cause a number of potential problems including:
- Excess IO, network load, memory use, and so on, due to the database engine reading data that is not needed and transmitting it to the front-end code. This is particularly a concern where there might be large fields such as those used to store long notes or attached files.
- Further excess IO load if the database needs to spool internal results to disk as part of the processing for a query more complex than
SELECT <columns> FROM <table>
. - Extra processing (and/or even more IO) if some of the unneeded columns are:
- computed columns in databases that support them
- in the case of selecting from a view, columns from a table/view that the query optimiser could otherwise optimise out
- The potential for unexpected errors if columns are added to tables and views later that results ambiguous column names. For example
SELECT * FROM orders JOIN people ON people.id = orders.personid ORDER BY displayname
- if a column column calleddisplayname
is added to the orders table to allow users to give their orders meaningful names for future reference then the column name will appear twice in the output so theORDER BY
clause will be ambiguous which may cause errors ("ambiguous column name" in recent MS SQL Server versions), and if not in this example your application code might start displaying the order name where the person name is intended because the new column is the first of that name returned, and so on.
# When Can You Use *
, Bearing The Above Warning In Mind?
While best avoided in production code, using *
is fine as a shorthand when performing manual queries against the database for investigation or prototype work.
Sometimes design decisions in your application make it unavoidable (in such circumstances, prefer tablealias.*
over just *
where possible).
When using EXISTS
, such as SELECT A.col1, A.Col2 FROM A WHERE EXISTS (SELECT * FROM B where A.ID = B.A_ID)
, we are not returning any data from B. Thus a join is unnecessary, and the engine knows no values from B are to be returned, thus no performance hit for using *
. Similarly COUNT(*)
is fine as it also doesn't actually return any of the columns, so only needs to read and process those that are used for filtering purposes.
# SELECT Using Column Aliases
Column aliases are used mainly to shorten code and make column names more readable.
Code becomes shorter as long table names and unnecessary identification of columns (e.g., there may be 2 IDs in the table, but only one is used in the statement) can be avoided. Along with table aliases (opens new window) this allows you to use longer descriptive names in your database structure while keeping queries upon that structure concise.
Furthermore they are sometimes required, for instance in views, in order to name computed outputs.
# All versions of SQL
Aliases can be created in all versions of SQL using double quotes ("
).
SELECT
FName AS "First Name",
MName AS "Middle Name",
LName AS "Last Name"
FROM Employees
# Different Versions of SQL
You can use single quotes ('
), double quotes ("
) and square brackets ([]
) to create an alias in Microsoft SQL Server.
SELECT
FName AS "First Name",
MName AS 'Middle Name',
LName AS [Last Name]
FROM Employees
Both will result in:
First Name | Middle Name | Last Name |
---|---|---|
James | John | Smith |
John | James | Johnson |
Michael | Marcus | Williams |
This statement will return FName
and LName
columns with a given name (an alias). This is achieved using the AS
operator followed by the alias, or simply writing alias directly after the column name. This means that the following query has the same outcome as the above.
SELECT
FName "First Name",
MName "Middle Name",
LName "Last Name"
FROM Employees
First Name | Middle Name | Last Name |
---|---|---|
James | John | Smith |
John | James | Johnson |
Michael | Marcus | Williams |
However, the explicit version (i.e., using the AS
operator) is more readable.
If the alias has a single word that is not a reserved word, we can write it without single quotes, double quotes or brackets:
SELECT
FName AS FirstName,
LName AS LastName
FROM Employees
FirstName | LastName |
---|---|
James | Smith |
John | Johnson |
Michael | Williams |
A further variation available in MS SQL Server amongst others is <alias> = <column-or-calculation>
, for instance:
SELECT FullName = FirstName + ' ' + LastName,
Addr1 = FullStreetAddress,
Addr2 = TownName
FROM CustomerDetails
which is equivalent to:
SELECT FirstName + ' ' + LastName As FullName
FullStreetAddress As Addr1,
TownName As Addr2
FROM CustomerDetails
Both will result in:
FullName | Addr1 | Addr2 |
---|---|---|
James Smith | 123 AnyStreet | TownVille |
John Johnson | 668 MyRoad | Anytown |
Michael Williams | 999 High End Dr | Williamsburgh |
Some find using =
instead of As
easier to read, though many recommend against this format, mainly because it is not standard so not widely supported by all databases. It may cause confusion with other uses of the =
character.
# All Versions of SQL
Also, if you need to use reserved words, you can use brackets or quotes to escape:
SELECT
FName as "SELECT",
MName as "FROM",
LName as "WHERE"
FROM Employees
# Different Versions of SQL
Likewise, you can escape keywords in MSSQL with all different approaches:
SELECT
FName AS "SELECT",
MName AS 'FROM',
LName AS [WHERE]
FROM Employees
SELECT | FROM | WHERE |
---|---|---|
James | John | Smith |
John | James | Johnson |
Michael | Marcus | Williams |
Also, a column alias may be used any of the final clauses of the same query, such as an ORDER BY
:
SELECT
FName AS FirstName,
LName AS LastName
FROM
Employees
ORDER BY
LastName DESC
However, you may not use
SELECT
FName AS SELECT,
LName AS FROM
FROM
Employees
ORDER BY
LastName DESC
To create an alias from these reserved words (SELECT
and FROM
).
This will cause numerous errors on execution.
# Select Individual Columns
SELECT
PhoneNumber,
Email,
PreferredContact
FROM Customers
This statement will return the columns PhoneNumber
, Email
, and PreferredContact
from all rows of the Customers
(opens new window) table. Also the columns will be returned in the sequence in which they appear in the SELECT
clause.
The result will be:
PhoneNumber | PreferredContact | |
---|---|---|
3347927472 | william.jones@example.com | PHONE |
2137921892 | dmiller@example.net | |
NULL | richard0123@example.com |
If multiple tables are joined together, you can select columns from specific tables by specifying the table name before the column name: [table_name].[column_name]
SELECT
Customers.PhoneNumber,
Customers.Email,
Customers.PreferredContact,
Orders.Id AS OrderId
FROM
Customers
LEFT JOIN
Orders ON Orders.CustomerId = Customers.Id
*AS OrderId
means that the Id
field of Orders
table will be returned as a column named OrderId
. See selecting with column alias (opens new window) for further information.
To avoid using long table names, you can use table aliases. This mitigates the pain of writing long table names for each field that you select in the joins. If you are performing a self join (a join between two instances of the same table), then you must use table aliases to distinguish your tables. We can write a table alias like Customers c
or Customers AS c
. Here c
works as an alias for Customers
and we can select let's say Email
like this: c.Email
.
SELECT
c.PhoneNumber,
c.Email,
c.PreferredContact,
o.Id AS OrderId
FROM
Customers c
LEFT JOIN
Orders o ON o.CustomerId = c.Id
# Selecting specified number of records
The SQL 2008 standard (opens new window) defines the FETCH FIRST
clause to limit the number of records returned.
SELECT Id, ProductName, UnitPrice, Package
FROM Product
ORDER BY UnitPrice DESC
FETCH FIRST 10 ROWS ONLY
This standard is only supported in recent versions of some RDMSs. Vendor-specific non-standard syntax is provided in other systems. Progress OpenEdge 11.x also supports the FETCH FIRST <n> ROWS ONLY
syntax.
Additionally, OFFSET <m> ROWS
before FETCH FIRST <n> ROWS ONLY
allows skipping rows before fetching rows.
SELECT Id, ProductName, UnitPrice, Package
FROM Product
ORDER BY UnitPrice DESC
OFFSET 5 ROWS
FETCH FIRST 10 ROWS ONLY
The following query is supported in SQL Server (opens new window) and MS Access:
SELECT TOP 10 Id, ProductName, UnitPrice, Package
FROM Product
ORDER BY UnitPrice DESC
To do the same in MySQL (opens new window) or PostgreSQL the LIMIT
keyword must be used:
SELECT Id, ProductName, UnitPrice, Package
FROM Product
ORDER BY UnitPrice DESC
LIMIT 10
In Oracle the same can be done with ROWNUM
:
SELECT Id, ProductName, UnitPrice, Package
FROM Product
WHERE ROWNUM <= 10
ORDER BY UnitPrice DESC
Results: 10 records.
Id ProductName UnitPrice Package
38 Côte de Blaye 263.50 12 - 75 cl bottles
29 Thüringer Rostbratwurst 123.79 50 bags x 30 sausgs.
9 Mishi Kobe Niku 97.00 18 - 500 g pkgs.
20 Sir Rodney's Marmalade 81.00 30 gift boxes
18 Carnarvon Tigers 62.50 16 kg pkg.
59 Raclette Courdavault 55.00 5 kg pkg.
51 Manjimup Dried Apples 53.00 50 - 300 g pkgs.
62 Tarte au sucre 49.30 48 pies
43 Ipoh Coffee 46.00 16 - 500 g tins
28 Rössle Sauerkraut 45.60 25 - 825 g cans
Vendor Nuances:
It is important to note that the TOP
in Microsoft SQL operates after the WHERE
clause and will return the specified number of results if they exist anywhere in the table, while ROWNUM
works as part of the WHERE
clause so if other conditions do not exist in the specified number of rows at the beginning of the table, you will get zero results when there could be others to be found.
# Selecting with Condition
The basic syntax of SELECT with WHERE clause is:
SELECT column1, column2, columnN
FROM table_name
WHERE [condition]
The [condition] can be any SQL expression, specified using comparison or logical operators like >, <, =, <>, >=, <=, LIKE, NOT, IN, BETWEEN etc.
The following statement returns all columns from the table 'Cars' where the status column is 'READY':
SELECT * FROM Cars WHERE status = 'READY'
See WHERE and HAVING (opens new window) for more examples.
# Selecting with CASE
When results need to have some logic applied 'on the fly' one can use CASE statement to implement it.
SELECT CASE WHEN Col1 < 50 THEN 'under' ELSE 'over' END threshold
FROM TableName
also can be chained
SELECT
CASE WHEN Col1 < 50 THEN 'under'
WHEN Col1 > 50 AND Col1 <100 THEN 'between'
ELSE 'over'
END threshold
FROM TableName
one also can have CASE
inside another CASE
statement
SELECT
CASE WHEN Col1 < 50 THEN 'under'
ELSE
CASE WHEN Col1 > 50 AND Col1 <100 THEN Col1
ELSE 'over' END
END threshold
FROM TableName
# Select columns which are named after reserved keywords
When a column name matches a reserved keyword, standard SQL requires that you enclose it in double quotation marks:
SELECT
"ORDER",
ID
FROM ORDERS
Note that it makes the column name case-sensitive.
Some DBMSes have proprietary ways of quoting names. For example, SQL Server uses square brackets for this purpose:
SELECT
[Order],
ID
FROM ORDERS
while MySQL (and MariaDB) by default use backticks:
SELECT
`Order`,
id
FROM orders
# Selecting with table alias
SELECT e.Fname, e.LName
FROM Employees e
The Employees table is given the alias 'e' directly after the table name. This helps remove ambiguity in scenarios where multiple tables have the same field name and you need to be specific as to which table you want to return data from.
SELECT e.Fname, e.LName, m.Fname AS ManagerFirstName
FROM Employees e
JOIN Managers m ON e.ManagerId = m.Id
Note that once you define an alias, you can't use the canonical table name anymore. i.e.,
SELECT e.Fname, Employees.LName, m.Fname AS ManagerFirstName
FROM Employees e
JOIN Managers m ON e.ManagerId = m.Id
would throw an error.
It is worth noting table aliases -- more formally 'range variables' -- were introduced into the SQL language to solve the problem of duplicate columns caused by INNER JOIN
. The 1992 SQL standard corrected this earlier design flaw by introducing NATURAL JOIN
(implemented in mySQL, PostgreSQL and Oracle but not yet in SQL Server), the result of which never has duplicate column names. The above example is interesting in that the tables are joined on columns with different names (Id
and ManagerId
) but are not supposed to be joined on the columns with the same name (LName
, FName
), requiring the renaming of the columns to be performed before the join:
SELECT Fname, LName, ManagerFirstName
FROM Employees
NATURAL JOIN
( SELECT Id AS ManagerId, Fname AS ManagerFirstName
FROM Managers ) m;
Note that although an alias/range variable must be declared for the dervied table (otherwise SQL will throw an error), it never makes sense to actually use it in the query.
# Selection with sorted Results
SELECT * FROM Employees ORDER BY LName
This statement will return all the columns from the table Employees
(opens new window).
Id | FName | LName | PhoneNumber |
---|---|---|---|
2 | John | Johnson | 2468101214 |
1 | James | Smith | 1234567890 |
3 | Michael | Williams | 1357911131 |
SELECT * FROM Employees ORDER BY LName DESC
Or
SELECT * FROM Employees ORDER BY LName ASC
This statement changes the sorting direction.
One may also specify multiple sorting columns. For example:
SELECT * FROM Employees ORDER BY LName ASC, FName ASC
This example will sort the results first by LName
and then, for records that have the same LName
, sort by FName
. This will give you a result similar to what you would find in a telephone book.
In order to save retyping the column name in the ORDER BY
clause, it is possible to use instead the column's number. Note that column numbers start from 1.
SELECT Id, FName, LName, PhoneNumber FROM Employees ORDER BY 3
You may also embed a CASE
statement in the ORDER BY
clause.
SELECT Id, FName, LName, PhoneNumber FROM Employees ORDER BY CASE WHEN LName='Jones` THEN 0 ELSE 1 END ASC
This will sort your results to have all records with the LName
of "Jones" at the top.
# Selecting with Aggregate functions
# Average
SELECT AVG(Salary) FROM Employees
SELECT AVG(Salary) FROM Employees where DepartmentId = 1
If employee is categorized with multiple department and we want to find avg salary for every department then we can use following query.
SELECT AVG(Salary) FROM Employees GROUP BY DepartmentId
# Minimum
SELECT MIN(Salary) FROM Employees
# Maximum
SELECT MAX(Salary) FROM Employees
# Count
SELECT Count(*) FROM Employees
SELECT Count(*) FROM Employees where ManagerId IS NOT NULL
Select Count(ManagerId) from Employees
Select Count(DISTINCT DepartmentId) from Employees
# Sum
SELECT SUM(Salary) FROM Employees
# Selecting without Locking the table
Sometimes when tables are used mostly (or only) for reads, indexing does not help anymore and every little bit counts, one might use selects without LOCK to improve performance.
SQL Server
SELECT * FROM TableName WITH (nolock)
MySQL
SET SESSION TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT * FROM TableName;
SET SESSION TRANSACTION ISOLATION LEVEL REPEATABLE READ;
Oracle
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
SELECT * FROM TableName;
DB2
SELECT * FROM TableName WITH UR;
where UR
stands for "uncommitted read".
If used on table that has record modifications going on might have unpredictable results.
# Select with condition of multiple values from column
SELECT * FROM Cars WHERE status IN ( 'Waiting', 'Working' )
This is semantically equivalent to
SELECT * FROM Cars WHERE ( status = 'Waiting' OR status = 'Working' )
i.e. value IN ( <value list> )
is a shorthand for disjunction (logical OR
).
# Get aggregated result for row groups
Counting rows based on a specific column value:
SELECT category, COUNT(*) AS item_count
FROM item
GROUP BY category;
Getting average income by department:
SELECT department, AVG(income)
FROM employees
GROUP BY department;
The important thing is to select only columns specified in the GROUP BY
(opens new window) clause or used with aggregate functions (opens new window).
There WHERE
clause can also be used with GROUP BY
, but WHERE
filters out records before any grouping is done:
SELECT department, AVG(income)
FROM employees
WHERE department <> 'ACCOUNTING'
GROUP BY department;
If you need to filter the results after the grouping has been done, e.g, to see only departments whose average income is larger than 1000, you need to use the HAVING
(opens new window) clause:
SELECT department, AVG(income)
FROM employees
WHERE department <> 'ACCOUNTING'
GROUP BY department
HAVING avg(income) > 1000;
# Selecting with more than 1 condition.
The AND
keyword is used to add more conditions to the query.
Name | Age | Gender |
---|---|---|
Sam | 18 | M |
John | 21 | M |
Bob | 22 | M |
Mary | 23 | F |
SELECT name FROM persons WHERE gender = 'M' AND age > 20;
This will return:
Name |
---|
John |
Bob |
using OR
keyword
SELECT name FROM persons WHERE gender = 'M' OR age < 20;
This will return:
name |
---|
Sam |
John |
Bob |
These keywords can be combined to allow for more complex criteria combinations:
SELECT name
FROM persons
WHERE (gender = 'M' AND age < 20)
OR (gender = 'F' AND age > 20);
This will return:
name |
---|
Sam |
Mary |
# Select rows from multiple tables
SELECT *
FROM
table1,
table2
SELECT
table1.column1,
table1.column2,
table2.column1
FROM
table1,
table2
This is called cross product in SQL it is same as cross product in sets
These statements return the selected columns from multiple tables in one query.
There is no specific relationship between the columns returned from each table.
# Selecting with null
SELECT Name FROM Customers WHERE PhoneNumber IS NULL
Selection with nulls take a different syntax. Don't use =
, use IS NULL
or IS NOT NULL
instead.
# Select distinct (unique values only)
SELECT DISTINCT ContinentCode
FROM Countries;
This query will return all DISTINCT
(unique, different) values from ContinentCode
column from Countries
(opens new window) table
ContinentCode |
---|
OC |
EU |
AS |
NA |
AF |
SQLFiddle Demo (opens new window)
# Syntax
SELECT [DISTINCT] [column1] [, [column2] ... ]
FROM [table]
[ WHERE condition ]
[ GROUP BY [column1] [, [column2] ... ]
# Remarks
SELECT determines which columns' data to return and in which order FROM a given table (given that they match the other requirements in your query specifically - where and having filters and joins).
SELECT Name, SerialNumber
FROM ArmyInfo
will only return results from the Name
and Serial Number
columns, but not from the column called Rank
, for example
SELECT *
FROM ArmyInfo
indicates that all columns will be returned. However, please note that it is poor practice to SELECT *
as you are literally returning all columns of a table.