MySQL - Joins

mysql

http://www.sitepoint.com/understanding-sql-joins-mysql-database/
http://net.tutsplus.com/tutorials/databases/quick-tip-working-with-mysql-and-inner-join/
http://blogs.sitepoint.com/understanding-sql-joins-mysql-database/

Inner Joins: The basic purpose of "inner joins" is to return matching records

Outer Joins: Outer joins do not require each record to have a matching records.

Left outer join: A left outer join of tables A and B will return all matching records of A and B, as well as any non-matching records from the left table, in this case, A

Right outer join: A right outer join of tables A and B will return all matching records of A and B, as well as any non-matching records from the right table, in this case, B.

Full outer join: A full outer join of tables A and B will return all matching records of A and B, as well as any non-matching records from both tables.

Understanding Relational Tables

Relational tables are designed so information is split into multiple tables, one for each data type. The tables are then related to each other through common values (and thus the relational in relational design).

Suppose that you had a database table containing a product catalog, with each catalog item in its own row. The kind of information you would stored with each item would include product description, price, along with vendor information about the company that create the product.

Now, suppose that you had multiple catalog items created by the same vendor. Where would you store the vendor information (things such as vendor name, address, and contact information)? You would not want to store that data along with the products for several reasons:

  1. Because the vendor information is the same for each product that vendor produces, repeating information for each product is a waste of time and storage space.
  2. If vendor information change (vendor moves, or his area code changes), you need to update multiple records (every occurrence of vendor information)
  3. When data is repeated (the vendor information is used with each product), there is a high probability that the data will not be entered exactly the same way each time. Inconsistent data is extremely difficult to use in reporting.

The key here is that having multiple occurrence of the same data is never a good thing, and that principle is the basis for relational database design. Relational tables are designed so information is split into multiple tables, one for each data type. The tables are then related to each other through common values (and thus the relational in relational design).

In our example, we can create 2 tables, one store product information (product), and one store vendor information (vendor). The vendor table contains all the vendor information, one table row per vendor, along with a unique identifier for each vendor. This value, called a primary key, can be vendor ID, or any other unique value. The product table stores only product information and no vendor specific information other than the vendor ID (the primary key of the vendor table). This key, called a foreign key, relates the vendor table to the product table.

Foreign key: A column in one table that contains the primary key values from another table, thus defining the relationship between tables.

Nothing in the table definitions tells MySQL how to join the table. You have to do it yourself.

Joining multiple tables can be very resource intensive, so becareful not to join tables unnecessarily. The more tables you join, the more performance degrades.

SIMPLE JOIN:

SELECT vendor.name, product.name, price FROM vendor, product
 WHERE vendor.id = product.vendorID ORDER BY vendor.name, product.name;

It might seem strange to use a WHERE clause to set the join relationship, but there is a very good reason for this. When you join two tables, what you are actually doing is pairing every row in the first table with every row in the second table. The WHERE clause acts as a filter to include only rows that match the specified filter condition. Without the WHERE clause, every row in the first table is paired with every row in the second table regardless of if they logically go together or not.

Cartesian Product: The results of a join (table relationship), without a join condition. The number of rows retrieved is the number of rows in the first table multiplied by the number of rows in the second table.

Cross Joins: the type of join that return a Cartesian Product.

INNER JOIN: The SIMPLE JOIN that you've seen is called an equijoin because it is based on testing of equality between two tables. This kind of join is also called an INNER JOIN. In fact, you may use a slightly different syntax:

SELECT vendor.name, product.name, price FROM vendor INNER JOIN product ON vendor.id = product.vendorID;

Per ANSI SQL specification, use of the INNER JOIN syntax is preferable. Also, using explicit join syntax ensure that you will never forget the join condition, and it can affect performance (in some cases).

SQL imposes no limit to the number of tables that may be joined in a SELECT statement. MySQL can join up to 63 tables. First list all the tables, and then defined the relationship between each:

SELECT product.name, vendor.name, price, quality FROM orderitems, products, vendor 
 WHERE product.vendorID = vendor.id AND orderitems.productID = product.id AND orderitems.orderID = 2005;

JOIN vs Sub-query:

SELECT customer.name, customer.contact FROM customer WHERE customer.id IN (
 SELECT order.customerID FROM order WHERE order.id IN (
  SELECT orderitems.orderID FROM orderitems WHERE orderitems.productID = 'TNT2'));

Here is the same query using join:

SELECT customer.name, customer.contact FROM customer, order, orderitems 
 WHERE customer.id = orders.customerID AND orderitems.orderID = order.id AND orderitems.productID='TNT2';

It pays to experiment. As you can see, there is often more than one way to perform any given SQL operation, and there is rarely a definite right or wrong way. Performance can be affected by the type of operation, the amount of data in the table, whether indexes and keys are present, and a whole slew of other criteria. Therefore, it is often worth experimenting with different selection mechanism to find the one that works best for you.

SELF JOIN: One of the primary reason to use table alias is to be able to refer to the same table more than once in a single SELECT statement. Suppose that a problem was found with a product, and you therefore want to know all the products made by the same vendor so as to determine if the problem applied to them too. This query requires that you first find out which vendor create item DTNTR, and next find which other products are made by the same vendor. The following is one way to approach this problem:

SELECT id, name FROM product WHERE vendorID = (SELECT vendorID FROM product WHERE id='DTNTR');

Another solution using join:

SELECT p1.id, p1.name FROM product AS p1, product AS p2 WHERE p1.vendorID = p2.vendorID AND p2.id = 'DTNTR';

Another example of self-join: To find all countries that have the same year of independence as some given country. To address this naming issue, create an alias for one or both table references and refer to the aliases. The aliases give you alternative unambiguous names to refer to each instance of the table in the query:

SELECT t1.IndepYear, t1.Name, t2.Name FROM Country AS t1, Country AS t2 WHERE t1.IndepYear = t2.IndepYear AND t1.Name = 'Qatar';

NATURAL JOIN: Standard join (INNER JOIN) return all column, even multiple occurrences of the same column. A natural join eliminates those multiple occurrence so only one of each column is returned. A natural join is a join in which you select only columns that are unique. This is typically done using the wildcard (SELECT *) for one table, and explicitly subsets of the columns for all other tables. The truth is, every inner join you have created thus var is actually a natural join, and you will probably never need an inner join that is not a natural join.

OUTER JOIN: Most joins relate rows in one table with rows in another table. But, occasionally, you want to include rows that have no related rows. For example, you might use joins to accomplish the following tasks:

  1. How many orders each customer placed, including customers who have yet to place an order.
  2. List all products with order quantities, including products not ordered by anyone
  3. Calculate average sale size, taking into account customers who have not yet placed an order.

In each of these examples, the join includes table rows that have no associated rows in the related table. This type of join is called an outer join.

SELECT c.id AS customerID, o.id AS orderID FROM customer AS c LEFT OUTER JOIN order AS o ON c.id = o.customerID;

When using OUTER JOIN syntax, you must use the RIGHT or LEFT keywords to specify the table from which to include all rows (RIGHT for the one on the right of OUTER JOIN, and LEFT for the one on the left).

OUTER JOIN finds both matches and non-matches.

A left join is a type of outer join. A left join treats the left table (the first one named) as a reference table and produces output for each row selected from it, whether or not the row is matched by rows in the right table. Like a join written with the INNER JOIN keywords, a LEFT JOIN is written using either ON or USING().

If a row from the left table matches any right table rows, the result includes for each match a row containing the left table columns and the right table columns. These are rows that an inner join would produce.

If the left table row doesn't match any right tables, the result includes a row containing the left table column values and NULL for any columns from the right table. These are rows that an outer join will produce but an inner join will not.

To find only non-matches, write a LEFT JOIN with a WHERE clause that requires a row combinations to have NULL value in the right table column:

SELECT Name, Language FROM Country LEFT JOIN CountryLanguage ON Code = CountryCode WHERE CountryCode IS NULL;

With a LEFT JOIN, the reference table should be listed on the left, and the table from which rows might be missing should be listed on the right.

If you're only looking for matches, you can do so with either an inner join or an outer join. In such cases, it's better to use an inner join because that allows MySQL optimizer to choose the most efficient order for processing the tables. Outer joins require that the reference table be processed first, which might not be the most efficient order.

Using joins with aggregate functions:

SELECT customer.name, customer.id, COUNT(order.id) AS num_order FROM customer 
 INNER JOIN order ON customer.id = order.customerID GROUP BY customer.id;

MySQL allows the use of join syntax in UPDATE and DELETE statements to enable updates or deletes that involve multiple tables.

http://dev.mysql.com/doc/refman/5.1/en/join.html

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License