SQL

https://drill.apache.org/ - Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage
SPARSQL
https://webflow-blog.periscopedata.com/blog/polygon-area-from-latitude-and-longitude-using-sql
http://ithare.com/understanding-correlated-and-uncorrelated-sub-queries-in-sql/
https://aws.amazon.com/blogs/database/amazon-aurora-under-the-hood-quorum-and-correlated-failure
http://www.kdnuggets.com/2017/08/write-better-sql-queries-definitive-guide-part-1.html

http://sql.learncodethehardway.org/book/
https://zeroturnaround.com/rebellabs/sql-cheat-sheet/

http://www.khankennels.com/blog/index.php/archives/2007/04/20/getting-joins/ - done reading
http://blog.codinghorror.com/a-visual-explanation-of-sql-joins/ - done reading
https://en.wikipedia.org/wiki/Join_%28SQL%29 - done reading
http://gplivna.blogspot.com/2007/10/natural-joins-are-evil-motto-if-you.html - done reading
http://www.sitepoint.com/rethinkdb-ruby-map-reduce-joins/

oracle
mysql

Why should we avoid using the IN clause especially when it contains a lot of values?

Oracle and MySQL recommend us to avoid using the IN clause because it is not optimized especially if we have a lot of values in the IN clause. Use a sub-select as a table may help to avoid the limitation with the IN clause.

What are the 5 ANSI types of joins?

  1. INNER
  2. LEFT OUTER
  3. RIGHT OUTER
  4. FULL OUTER
  5. CROSS

What is an INNER JOIN?

It is the regular join.

venn1.png

In the above image, the red region represent the inner join. The inner join represent a subset of data that is common to both table.

SELECT * FROM TableA
INNER JOIN TableB
ON TableA.name = TableB.name
SELECT *
FROM employee 
INNER JOIN department ON employee.DepartmentID = department.DepartmentID;

What is an OUTER JOIN?

An outer join is the inverse of the inner join. It only returns those records not in T1 and T2. “Give me the records that DON’T have a match.” In programming logic – think in terms of NOT AND.

outervenn.png
SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name

Full outer join produces the set of all records in Table A and Table B, with matching records from both sides where available. If there is no match, the missing side will contain null.

SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableA.id IS null
OR TableB.id IS null

To produce the set of records unique to Table A and Table B, we perform the same full outer join, then exclude the records we don't want from both sides via a where clause.

SELECT * FROM TableA
FULL OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableA.id IS null
OR TableB.id IS null

What is a LEFT JOIN?

A left join returns all the records in the “left” table (T1) whether they have a match in the right table or not. If, however, they do have a match in the right table – give me the “matching” data from the right table as well. If not – fill in the holes with null.

left_venn.png

It should be noted that the same thing is possible with a right join – most people just use a left one.

What is a LEFT OUTER JOIN?

A left outer join combines the ideas behind a left join and an outer join. Basically – if you use a left outer join you will get the records in the left table that DO NOT have a match in the right table.

leftOutervenn.png

Again it is noted that the same thing is possible with a right outer join – most people just use a left one.

SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name

Left outer join produces a complete set of records from Table A, with the matching records (where available) in Table B. If there is no match, the right side will contain null.

To produce the set of records only in Table A, but not in Table B, we perform the same left outer join, then exclude the records we don't want from the right side via a where clause:

SELECT * FROM TableA
LEFT OUTER JOIN TableB
ON TableA.name = TableB.name
WHERE TableB.id IS null

The result of a left outer join (or simply left join) for tables A and B always contains all rows of the "left" table (A), even if the join-condition does not find any matching row in the "right" table (B). This means that if the ON clause matches 0 (zero) rows in B (for a given row in A), the join will still return a row in the result (for that row)—but with NULL in each column from B. A left outer join returns all the values from an inner join plus all values in the left table that do not match to the right table, including rows with NULL (empty) values in the link column.

For example, this allows us to find an employee's department, but still shows employees that have not been assigned to a department (contrary to the inner-join example above, where unassigned employees were excluded from the result). Example of a left outer join (the OUTER keyword is optional), with the additional result row (compared with the inner join) italicized:

SELECT *
FROM employee 
LEFT OUTER JOIN department ON employee.DepartmentID = department.DepartmentID;

Oracle supports the deprecated[8] syntax:

SELECT *
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID(+)

Sybase supports the syntax (Microsoft SQL Server deprecated this syntax since version 2000):

SELECT *
FROM employee, department
WHERE employee.DepartmentID *= department.DepartmentID

IBM Informix supports the syntax:

SELECT *
FROM employee, OUTER department
WHERE employee.DepartmentID = department.DepartmentID

What is a Theta JOIN?

A theta join is the Cartesian product of the 2 tables and not normally what people are looking for – but what they sometimes get by mistake.

SELECT t1.*, t2.*
FROM table1 t1, table2 t2
WHERE t1.id = 5;
thetavenn.png

What is a cross join?

It is another name for Theta join.

SELECT * FROM TableA
CROSS JOIN TableB

This joins "everything to everything", resulting in 4 x 4 = 16 rows, far more than we had in the original sets. If you do the math, you can see why this is a very dangerous join to run against large tables.

SELECT * FROM employee CROSS JOIN department;

Example of an implicit cross join:

SELECT * FROM employee, department;

An implicit cross join is the result when we either forget to specify the join condition or deliberately omit the join condition. The cross join does not itself apply any predicate to filter rows from the joined table. The results of a cross join can be filtered by using a WHERE clause which may then produce the equivalent of an inner join.

What is a NATURAL JOIN?

It is another name for the regular inner join, however with NATURAL JOIN, we do not specify the join conditions and it is determined automatically based on column names that are common in both tables.

SELECT surname, first_name, country
FROM persons
NATURAL JOIN addresses;

Natural joins between two tables without at least one common column results in cartesian product.

With different data types but the same column names there might be type conversion problems and quite strange error messages (see that it is NATURAL OUTER LEFT or RIGHT JOIN).

The natural join is a special case of equi-join. A natural join is a type of equi-join where the join predicate arises implicitly by comparing all columns in both tables that have the same column-names in the joined tables. The resulting joined table contains only one column for each pair of equally named columns. In the case that no columns with the same names are found, the result is a cross join.

Most experts agree that NATURAL JOINs are dangerous and therefore strongly discourage their use.[7] The danger comes from inadvertently adding a new column, named the same as another column in the other table. An existing natural join might then "naturally" use the new column for comparisons, making comparisons/matches using different criteria (from different columns) than before. Thus an existing query could produce different results, even though the data in the tables have not been changed, but only augmented. The use of column names to automatically determine table links is not an option in large databases with hundreds or thousands of tables where it would place an unrealistic constraint on naming conventions. Real world databases are commonly designed with foreign key data that is not consistently populated (NULL values are allowed), due to business rules and context. It is common practice to modify column names of similar data in different tables and this lack of rigid consistency relegates natural joins to a theoretical concept for discussion.

In many database environments the column names are controlled by an outside vendor, not the query developer. A natural join assumes stability and consistency in column names which can change during vendor mandated version upgrades.

What is a hash join or sort-merge join?

Inner join can be defined as the outcome of first taking the Cartesian product (or Cross join) of all rows in the tables (combining every row in table A with every row in table B) and then returning all rows which satisfy the join predicate. Actual SQL implementations normally use other approaches, such as hash joins or sort-merge joins, since computing the Cartesian product is slower and would often require a prohibitively large amount of memory to store.

What is an Equi-join?

It is a join that uses only equality comparisons in the join predicate / condition. Using other comparison operators (such as <) disqualifies a join as an equi-join. The query shown above has already provided an example of an equi-join:

SELECT *
FROM employee JOIN department
  ON employee.DepartmentID = department.DepartmentID;

SELECT *
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;

If columns in an equi-join have the same name, SQL-92 provides an optional shorthand notation for expressing equi-joins, by way of the USING construct:

SELECT *
FROM employee INNER JOIN department USING (DepartmentID);

What is the purpose of the USING clause in a JOIN?

If columns in an equi-join have the same name, SQL-92 provides an optional shorthand notation for expressing equi-joins, by way of the USING construct:

SELECT *
FROM employee INNER JOIN department USING (DepartmentID);

is the same as:

SELECT *
FROM employee JOIN department
  ON employee.DepartmentID = department.DepartmentID;

SELECT *
FROM employee, department
WHERE employee.DepartmentID = department.DepartmentID;

The USING construct is more than mere syntactic sugar, however, since the result set differs from the result set of the version with the explicit predicate. Specifically, any columns mentioned in the USING list will appear only once, with an unqualified name, rather than once for each table in the join. In the case above, there will be a single DepartmentID column and no employee.DepartmentID or department.DepartmentID.

The USING clause is not supported by MS SQL Server and Sybase.

How can we simulate a full outer join?

Some database systems do not support the full outer join functionality directly, but they can emulate it through the use of an inner join and UNION ALL selects of the "single table rows" from left and right tables respectively. The same example can appear as follows:

SELECT employee.LastName, employee.DepartmentID,
       department.DepartmentName, department.DepartmentID
FROM employee
INNER JOIN department ON employee.DepartmentID = department.DepartmentID

UNION ALL

SELECT employee.LastName, employee.DepartmentID,
       cast(NULL as varchar(20)), cast(NULL as integer)
FROM employee
WHERE NOT EXISTS (
    SELECT * FROM department
             WHERE employee.DepartmentID = department.DepartmentID)

UNION ALL

SELECT cast(NULL as varchar(20)), cast(NULL as integer),
       department.DepartmentName, department.DepartmentID
FROM department
WHERE NOT EXISTS (
    SELECT * FROM employee
             WHERE employee.DepartmentID = department.DepartmentID)

What is a self-join?

A self-join is joining a table to itself. If there were two separate tables for employees and a query which requested employees in the first table having the same country as employees in the second table, a normal join operation could be used to find the answer table. However, all the employee information is contained within a single large table.

SELECT F.EmployeeID, F.LastName, S.EmployeeID, S.LastName, F.Country
FROM Employee F INNER JOIN Employee S ON F.Country = S.Country
WHERE F.EmployeeID < S.EmployeeID
ORDER BY F.EmployeeID, S.EmployeeID;

What are the available join algorithms?

  1. Nested loop-in
  2. Sort-merge join
  3. Hash join

What is a straight join?

Some database systems allow the user to force the system to read the tables in a join in a particular order. This is used when the join optimizer chooses to read the tables in an inefficient order. For example, in MySQL the command STRAIGHT_JOIN does an inner join but reads the tables in exactly the order listed in the query

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License