Sub-Queries

mysql

What is a subquery?

A subquery is a SELECT statement that is placed within parentheses inside another SQL statement.

When should we use a sub-query and when should we not use a sub-query?

Using subqueries in a WHERE clause enable you to write extremely powerful and flexible SQL statements (combine several statements into one statement and have it executed at once on the server). There is no limit imposed on the number of sub-queries that can be nested, although in practice, you will find that performance tells you when you are nesting too deeply.

When using a sub-query in a WHERE clause, make sure that the sub-select statement have the same number of columns as in the WHERE clause. Usually, a single column will be returned by the sub-query and matched against a single column, but multiple columns may be used if needed.

The following query displays name and contact of customers who ordered productID='TNT2':

SELECT name, contact FROM customer WHERE id IN (SELECT customerID FROM order WHERE order.id IN 
 (SELECT orderID FROM orderitems WHERE productID='TNT2'));

Although usually used in conjunction with the IN operator, sub-queries can also be used to test for equality, and non-equality, and so on.

Using sub-queries is not always the most efficient way to perform this type of data retrieval, although it might be.

How can we use a sub-query as a calculated field?

SELECT name,state, (SELECT COUNT(*) FROM orders WHERE order.customerID = customer.id) AS orders 
 FROM customer ORDER BY name;

The sub-query above is executed once for every customer retrieved.

What is a correlated sub-query?

A Correlated sub-query is a query that refers to the table of the outer query. For example:

SELECT name,state, (SELECT COUNT(*) FROM orders WHERE order.customerID = customer.id) AS orders 
 FROM customer ORDER BY name;

What is a scalar sub-query?

Scalar subqueries returns a single value (one row with one column of data)

What is a row sub-query?

Row subqueries returns a single row with one or more column of data.

What is a column sub-query?

Column subqueries return a single column with one or more rows of data.

What is a table sub-query?

Table subqueries return a result with one or more rows containing one or more columns of data.

SELECT Language FROM CountryLanguage WHERE CountryCode = 
 (SELECT Code FROM Country WHERE Name='Finland');

The following statement uses a sub-query to determine which country has the most populous city in the world:

SELECT Country.Name FROM Country, City WHERE Country.Code = City.CountryCode AND City.Population = 
 (SELECT MAX(Population) FROM City);

Can we rewrite a sub-query as a normal query using joins?

Yes. Sub-queries can be rewritten to completely equivalent (and often more efficient) queries using joins. Nonetheless, sub-queries are preferred by many as an alternative way of specifying relations that otherwise requires complex joins or unions.

Where can we use scalar sub-query?

Scalar sub-queries can appear almost anywhere that a scalar value is allowed by SQL syntax. This means that you can use sub-queries as a function parameters, use mathematical operators on sub-queries that contain numeric values, etc.

SELECT CONCAT('The country code for Finland is: ', (SELECT Code FROM Country WHERE Name='Finland')) AS s1;
SELECT (SELECT SUM(Population) FROM City) / (SELECT SUM(Population) FROM Country) AS ratio;

The result of a scalar sub-query can be assigned to a user variable for later use:

SET @city_pop = (SELECT SUM(Population) FROM City);
SET @country_pop = (SELECT SUM(Population) FROM Country);
SELECT @city_pop / @country_pop;

Can we use scalar sub-query when a literal value is required, such as for an argument in a LIMIT clause.?

You cannot use a scalar sub-query when a literal value is required, such as for an argument in a LIMIT clause.

A non-correlated sub-query contains no references to the outer query and is not dependent on it. As a result, a non-correlated sub-query could be evaluated as a completely separated statement. A correlated sub-query contains references to the values in the outer query and cannot be evaluated independently.

In the following correlated sub-query, we calculate which country on each continent has the largest population. The value of the column Continent, which appears in the outer query, is used to limit which rows to consider for the MAX() calculation in the sub-query:

SELECT Continent, Name, Population FROM Country c WHERE Population = 
    (SELECT MAX(Population) FROM Country c2 WHERE c.Continent = c2.Continent AND Population > 0);

You are not limited to using the = operator. When comparing the values in the outer query with those returned by a scalar sub-query, you can use any of the usual comparison operator.

When a comparison requires a scalar sub-query, it is an error if the sub-query returns more than a single value.

To perform a comparison between a scalar value and the result of a column sub-query, we must use a quantified comparison. The quantifier keywords ALL, ANY, and SOME allow comparison to multiple-row results.

Using the ALL keyword in comparison with a column sub-query limits the result set to only those records where the comparison is true for all values produced by the sub-query. The following query all countries where the population is larger than the average country population of all the continents:

SELECT Name, Population FROM Country WHERE Population > ALL 
 (SELECT AVG(Population) FROM Country GROUP BY Continent)
 ORDER BY Name

Comparison using quantifier keyword ANY will succeed if any values in the result of the column sub-query succeed in the comparison. The following query finds the countries on the European continent, and for each one, tests whether the country is among the worldwide list of countries where Spanish is spoken:

SELECT Name FROM Country WHERE Continent = 'Europe' AND Code = ANY 
 (SELECT CountryCode FROM CountryLanguage WHERE Language='Spanish')
 ORDER BY Name;

The word SOME is an alias for ANY.

ALL, ANY, SOME can be used with any comparison operator. IN is equivalent to = ANY. IN cannot be combined with any comparison operator. NOT IN is not an alias of <> ANY. It is an alias for <> ALL

The EXISTS predicate performs a simple test which tells you whether the sub-query finds any rows. It does not return the actual values found in any of the rows, it merely return TRUE if any rows were found. The following query finds countries on the European continent where Spanish is spoken:

SELECT Code c, Name FROM Country WHERE Continent = 'Europe' AND EXIST 
 (SELECT * FROM CountryLanguage WHERE CountryCode = c AND Language='Spanish');

For row sub-query, we can perform an equality comparison for all columns. This method of comparison is not often used, but can provide some convenience for certain comparison operations:

SELECT City.Name FROM City WHERE (City.ID, City.CountryCode) = 
 (SELECT Capital, Code FROM Country WHERE Name='Finland');

Notice the use of the construct (City.ID, City.CountryCode). This creates a tuple of values and is known as a "row constructor". An equivalent method of defining a row is using ROW() to underscore the fact that the values are used to construct a row of data for comparison: ROW(City.ID, City.CountryCode).

Comparison of row sub-query can only be done using the = operator.

Row constructors are commonly used with row sub-query, it can be used in other context:

SELECT Name, Population FROM Country WHERE (Continent, Region) = ('Europe', 'Western Europe');

In practice, row constructors are often inefficient when used like this.

Sub-queries can be used in the FROM clause:

SELECT AVG(cont_sum) FROM 
 (SELECT Continent, SUM(Population) AS cont_sum FROM Country GROUP BY Continent)
 AS t;

Every table that appears in a FROM clause must have a name, so a sub-query in the FROM clause must be followed by a table alias. The SELECT in the FROM clause can be a table sub-query, even if not all columns are used by the outer query.

Sub-queries in FROM clause cannot be correlated with the outer statement.

Converting sub-queries to inner joins:

Sub-queries that find matches between tables can be converted to inner joins. The result of the join might not be the same as that of the sub-query (due to pairing). You may need to use DISTINCT:

SELECT DISTINCT Name FROM Country, CountryLanguage WHERE Code = CountryCode

The output from the join lists each matched country once for each time its country code occurs in the CountryLanguage table.

Sub-queries that find non-matches can be rewritten as outer joins. For example, a NOT IN sub-query that finds countries which has no entries in CountryLanguage table:

SELECT Name FROM Country WHERE Code NOT IN 
 (SELECT CountryCode FROM CountryLanguage);

can be rewritten as:

SELECT Name FROM Country LEFT JOIN CountryLanguage ON Code = CountryCode WHERE CountryCode IS NULL;

Any LEFT JOIN can be rewritten as a RIGHT JOIN.

The use of sub-queries is not limited to SELECT statements. Any SQL statement that includes a WHERE clause or a scalar expression may use sub-queries:

CREATE TABLE NACities
 SELECT * FROM Cities WHERE CountryCode IN 
 (SELECT Code FROM Country WHERE Continent = 'North America');
DELETE FROM NACities WHERE CountryCode IN 
 (SELECT Code FROM Country WHERE LifeExpectancy < 70.0);
update cc_member_question set answersCount=(SELECT COUNT(*) FROM cc_member_qauestion_reply WHERE cc_member_qauestion_reply.iQuestionId = cc_member_question.iQuestionId GROUP BY cc_member_qauestion_reply.iQuestionId);

Multicolumn UPDATE with Subquery the MySQL way

Although sub-queries can be used to retrieve or aggregate data from other tables for use in statements that modify tables (UPDATE, DELETE, INSERT, and REPLACE), MySQL does not allow a table that is being updated in the outer query to be selected from in any sub-query.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License