William Liu

SQL Queries for Mere Mortals

‘SQL Queries for Mere Mortals’ shows you how to write queries using a variety of databases, including SQL Server, MySQL, and PostgreSQL

Overview

The book is broken into six sections:

  1. How Databases are mathematical models - Chapter 1
  2. SQL Basics (e.g. select, where, order by) - Chapter 4
  3. Working with multiple tables (e.g. joins, subqueries) - Chapter 7
  4. Summarizing and Grouping Data (e.g. group by, having) - Chapter 12
  5. Modifying sets of data (e.g. update, insert, delete) - Chapter 15
  6. Complex problems (e.g. not, and, case) - Chapter 18

Entity Relationship Diagram (ERD)

An Entity Relationship Diagram (aka ERD, ER Diagram, ER Model) is a structural diagram for use in database design. An ERD contains two different symbols and connectors that visualize two important information: the major entities within the system scope and the inter-relationships among these entities.

Sample Databases

Get the code samples here: http://www.informit.com/store/sql-queries-for-mere-mortals-a-hands-on-guide-to-data-9780134858333

Schemas

A database schema is the skeleton structure that represents the logical view of the entire database. If defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied on the data.

Here are some example Schemas we’ll be using from the book:

Bowling League

This database tracks bowling teams, team members, the matches they played, and the results.

Bowler Scores

Table Bowler_Scores

Field           Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
MatchID         int             4           Yes         0           Yes             Match_Games.MatchID
GameNumber      smallint        2           Yes         0           Yes             Match_Games.GameNumber
BowlerID        int             4           Yes         0           Yes             Bowlers.BowlerID
RawScore        smallint        2                       0
HandiCapScore   smallint        2                       0
WonGame         bit             1

Indexes for Bowler_Scores

Index Name                  Attributes/Field Names
--------------------------------------------------
BowlersBowler_Scores
                            BowlerID

Match_GamesBowler_Scores
                            MatchID
                            GameNumber

PrimaryKey                  Primary
                            MatchID
                            GameNumber
                            BowlerID

Bowlers

Table Bowlers

Field               Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
BowlerID            int             4           Yes         0           Yes
BowlerLastName      nvarchar        50
BowlerFirstName     nvarchar        50
BowlerMiddleInit    nvarchar        1
BowlerAddress       nvarchar        50
BowlerCity          nvarchar        50
BowlerState         nvarchar        2
BowlerZip           nvarchar        10
BowlerPhoneNumber   nvarchar        14
TeamID              int             4                                                   Teams.TeamID

Indexes for Bowlers

Index Name                  Attributes/Field Names
--------------------------------------------------
BowlerLastName
                            BowlerLastName

PrimaryKey                  Primary
                            BowlerID

TeamBowlers
                            TeamID

Match Games

Table Match_Games

Field               Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
MatchID             int             4           Yes         0           Yes             Tourney_Matches.MatchID
GameNumber          smallint        2           Yes         0           Yes
WinningTeamID       int             4                       0

Indexes for Match_Games

Index Name                  Attributes/Field Names
--------------------------------------------------
PrimaryKey                  Primary
                            MatchID
                            GameNumber

Tourney_MatchesMatch_Games
                            MatchID

Teams

Table Teams

Field               Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
TeamID              int             4           Yes                     Yes
TeamName            nvarchar        50          Yes
CaptainID           int             4                                                   Bowlers.BowlerID

Indexes for Teams

Index Name                  Attributes/Field Names
--------------------------------------------------
BowlersTeams                Unique
                            CaptainID

CaptainID                   Unique
                            CaptainID

PrimaryKey                  Primary
                            TeamID

TeamID                      Unique
                            TeamID

Tournaments

Table Tournaments

Field               Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
TourneyID           int             4           Yes         0           Yes
TourneyDate         date            8
TourneyLocation     nvarchar        50

Indexes for Tournaments

Index Name                  Attributes/Field Names
--------------------------------------------------
PrimaryKey                  Primary
                            TourneyID

Tourney Matches

Table Tourney_Matches

Field               Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
MatchID             int             4           Yes         0           Yes
TourneyID           int             4                       0                           Tournaments.TourneyID
Lanes               nvarchar        5
OddLaneTeamID       int             4                       0                           Teams.TeamID
EvenLaneTeamID      int             4                       0                           Teams.TeamID

Indexes for Tourney_Matches

Index Name                  Attributes/Field Names
--------------------------------------------------
PrimaryKey                  Primary
                            MatchID

TeamsTourney_Matches
                            OddLaneTeamID

TeamsTourney_Matches1
                            EvenLaneTeamID

TournamentsTourney_Matches
                            TourneyID

ztblBowlerRatings

Table ztblBowlerRatings

Field               Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
BowlerRating        nvarchar        15          Yes
BowlerLowAvg        smallint        2
BowlerHighAvg       smallint        2

Indexes for ztblBowlerRatings

Index Name                  Attributes/Field Names
--------------------------------------------------
PrimaryKey                  Primary
                            BowlerRating

ztblSkipLabels

Table ztblSkipLabels

Field               Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
LabelCount          int             4           Yes                     Yes

Indexes for ztblSkipLabels

Index Name                  Attributes/Field Names
--------------------------------------------------
PrimaryKey                  Primary
                            LabelCount

ztblWeeks

Table ztblWeeks

Field               Data Type       Size        Required    Default     Primary Key     Foreign Key
----------------------------------------------------------------------------------------------------------
WeekStart           date            8           Yes                     Yes
WeekEnd             date            8

Indexes for ztblWeeks

Index Name                  Attributes/Field Names
--------------------------------------------------
PrimaryKey                  Primary
                            WeekStart

Relationship Constraints

Name                            Parent          Parent Fields       Child           Child Fields
------------------------------------------------------------------------------------------------
BowlersBowler_Scores            Bowlers         BowlerID            Bowler_Scores   BowlerID
BowlersTeams                    Bowlers         BowlerID            Teams           CaptainID
Match_GamesBowler_Scores        Match_Games     MatchID_GameNumber  Bowler_Scores   MatchID_GameNumber
TeamsBowlers                    Teams           TeamID              Bowlers         TeamID
TeamsTourney_Matches            Teams           TeamID              Tourney_Matches OddLaneTeamID
TeamsTourney_Matches1           Teams           TeamID              Tourney_Matches EvenLaneTeamID

Types of Databases

A database is an organized collection of data used to model some type of organization or organizational process.

There are two types of databases:

Relational Database Systems

A relational database management system (RDBMS) is a software application program used to create, maintain, modify, and manipulate a relational database.

A data warehouse is a large store of data accumulated from a wide range of sources and would enable organizations to access data stored in any number of nonrelational databases.

Anatomy of a Relational Database

Data in a relational database is stored in relations, which appear as tables. Each relation is made up of tuples (records of rows) and attributes (fields or columns).

Tables

Tables are the main structures in the database. Each table represents a single, specific object. Each table contains at least one column that uniquely identifies each of its rows (aka primary key).

The subject that a given table represents is usually either an object or an event.

Table representing Objects

When the subject of a table is an object, the table represents something tangible; a person, place, or thing. Example objects are Students, Buildings, Equipment, Customers

CustomemrID     FirstName   LastName    StreetAddress   City        State   ZipCode
---------------------------------------------------------------------------------
1010            Angel       Kennedy     667 Red Rd      Austin      TX      78710
1011            Alaina      Hallmark    2114 Longview   San Diego   CA      92199

Table representing Events

When the subject of a table is an event, the table represents something that occurs at a given point in time and has characteristics that you wish to record. Example events are Judicial Hearings, Lab Test Results, Patient Visits, and Geological Surveys.

PatientID   VisitDate   VisitTime   Physician   BloodPressure   Temperature
---------------------------------------------------------------------------
92001       2006-05-01  10:30       Ehrlich     120/80          98.8
96106       2006-05-02  11:00       Hallmark    160/90          99.1

Columns and Rows

A column is the smallest structure in the database. It represents a characteristic of the subject of the table. Every column in a properly designed database contains one and only one value. The name of the column identifies the type of value it holds, that way entering data is intuitive. Example column names might be FirstName, LastName, City, State, ZipCode

A row represents a unique instance of the subject of a table. It is composed of the entire set of columns in a table, regardless of whether or not the columns contain any values. Because of how a table is defined, each row is identified throughout the database by a unique value in the primary key column(s) of that row.

Keys

Keys are special columns that play very specific roles in a table; depending on the type of key, there’s a different purpose. The two most important ones help describe the relationships between tables using the primary key and the foreign key.

Primary Key

A primary key is one or more columns that uniquely identify each row within a table. When a primary key is made up of two or more columns, it is known as a composite primary key.

A primary key is important for two reasons:

Primary keys also enforce table-level integrity and help establish relationships with other tables. Every table should have a primary key.

Here’s an example:

'Agents' Table
AgentID     AgentFirstName      AgentLastName   DateHired   AgentHomePhone
--------------------------------------------------------------------------
1           William             Thompson        15-May-01   555-2681
2           Scott               Bishop          10-Feb-03   555-2666
3           Carol               Viescas         09-Sep-00   555-2571

'Entertainers' Table
EntertainerID   AgentID     EntertainerName     EntertainerPhone
----------------------------------------------------------------
1001            1           Carol Peacock Trio  555-2691
1002            3           Topazz              555-2591
1003            3           JV & the Deep Six   555-2511

In the ‘Agents’ table, we have the Primary Key of ‘AgentID’. In the ‘Entertainers’ table, we have the Primary Key of ‘EntertainerID’ and Foreign Key of ‘AgentID’ that references ‘Agents’ table ‘AgentID’.

Foreign Key

When you determine that a pair of tables have a relationship to each other, you establish the relationship by taking a copy of the primary key from the first table and insert it into the second table, where it becomes the foreign key.

The term foreign key comes from the fact that the second table already has a primary key of its own, and the primary key you are introducing from the first table is foreign to the second table.

Foreign keys are important for establishing relationships between pairs of tables and also for ensuring relationship-level integrity. This means that the rows in both tables will always be properly related because the values of a foreign key MUST be drawn from the values of the primary key from which it refers (i.e. helps you avoid ‘orphaned rows’)

Views

A view is a virtual table composed of columns from one or more tables in the database. The tables that comprise the view are known as base tables. The view is virtual because it draws data from base tables rather than storing any data on its own.

Relationships

If rows in a given table can be associated in some way with rows in another table, the tables have a relationship between them. There’s three types of relationships that can exist between a pair of tables:

One-to-One

A pair of tables has a one-to-one relationship when a single row in the first table is related to only one row in the second table AND a single row in the second table is related to only one row in the first table.

The first table is usually referred to as the primary table. The second table is usually referred to as the secondary table.

'Agents' Table (the primary table)
AgentID     AgentFirstName      AgentLastName   DateHired   AgentHomePhone
--------------------------------------------------------------------------
1           William             Thompson        15-May-01   555-2681
2           Scott               Bishop          10-Feb-03   555-2666
3           Carol               Viescas         09-Sep-00   555-2571

'Compenstation' Table (the secondary table)
AgentID     Salary      CommissionRate
--------------------------------------
1           $35,000     4.00%
2           $27,000     4.00%
3           $30,000     5.00%

‘Compensation’ is the secondary table because it doesn’t make sense to add a salary without having an associated ‘Agent’. You cannot add a row to the secondary table (‘Compensation’) unless a row exists in the primary table. These one-to-one types of tables are uncommon. You might split this data into two parts for confidentiality purposes.

One-to-Many

When a pair of tables has a one-to-many relationship, a single row in the first table can be related to many rows in the second table, but a single row in the second table can only be related to only one row in the first table.

So how does this work? You take the primary key of the primary table and insert it into the secondary table as a foreign key.

'Entertainers' Table
EntertainerID   AgentID     EntertainerName     EntertainerPhone
----------------------------------------------------------------
1001            1           Carol Peacock Trio  555-2691
1002            3           Topazz              555-2591
1003            3           JV & the Deep Six   555-2511

'Engagements' Table
EngagementID    EntertainerID   CustomerID  StartDate   EndDate
------------------------------------------------------------------
5               1003            10006       2007-09-11  2007-09-14
7               1002            10004       2007-09-11  2007-09-19
10              1003            10005       2007-09-17  2007-09-26
12              1001            10014       2007-09-18  2007-09-26

In the above example, a single row in the Entertainers table can be related to many rows in the Engagements table, but a single row in the Engagements table can be related to only one row in the Entertainers table. EntertainerID is a foreign key in the Engagements table.

Many-to-Many

A pair of tables is a many-to-many relationship when a single row in the first table can be related to many rows in the second table, and a single row in the second table can be related to many rows in the first table.

In order to establish this many-to-many relationship correctly, you must create a linking table, which provides an easy way to associate rows from one table with those of the other. You define a linking table by taking a copy of the primary key of each table and using them to form the structure of the new table. Together, they form the composite primary key of the linking table, and separately they each serve as a foreign key.

Example:

'Customers' Table
CustomerID      CustFirstName       CustLastName        CustPhone
10001           Doris               Hartwig             555-2671
10002           Deb                 Waldal              555-2496
10003           Peter               Brehm               555-2501

'Entertainers' Table
EntertainerID   AgentID     EntertainerName     EntertainerPhone
----------------------------------------------------------------
1001            1           Carol Peacock Trio  555-2691
1002            3           Topazz              555-2591
1003            3           JV & the Deep Six   555-2511

The above many-to-many relationship is unresolved because the many-to-many relationship has not been properly established. The issue is: How do you associate rows from the first table to the second table? The solution is to create and use a linking table.

Engagements Table (as a 'linking table')
EngagementID    CustomerID      EntertainerID   StartDate
---------------------------------------------------------
43              10001           1001            2007-10-21
58              10001           1002            2007-12-01
62              10003           1005            2007-12-09
71              10002           1003            2007-12-22
125             10001           1003            2008-02-23

In the above, we create a new table ‘Engagements’ as a linking table. Here you can store additional information in the linking table like the StartDate or Cost.

Database Theory vs Database Design

Database Theory is the principles and rules that formulate the basis of the relational database model. Database theory guarantees that the relational database is structurally sound and that all actions taken on the data in the database have predictable results.

Database Design is the structured, organized set of processes used to design a relational database. A good database design methodology helps ensure the integrity, consistency, and accuracy of the data in the database. Consider books like ‘Database Design for Mere Mortals’ or ‘An Introduction to Database Systems’.

Database Structure

In order to make sure that your database structure is sound, let’s take a look at:

Database Structure (Columns)

Since columns are our smallest data structure, let’s look at them first:

Multipart Columns

Multipart columns occur where you can say yes to the question: “Can I take the current value of this column and break it up into smaller, more distinct parts?”. Here’s an example:

CustomerID      CustomerName        StreetAddress
----------------------------------------------------------------------------
1001            Suzanne Viescas     15127 NE 24th, #383 Redmond, WA 98052
1002            William Thompson    122 Spring River Drive, Duvall, WA 98019
1003            Gary Hallmark       Route 2, Box 203B, Auburn, WA 98002

In the above, you can see that columns CustomerName and StreetAddress can be broken up into more distinct columns. CustomerName can be CustFirstName and CustLastName while StreetAddress can be broken up into CustAddress, CustCity, CustZipCode, CustState, etc.

More complicated multipart examples might be:

'Instruments' Table
InstrumentID    Manufacturer    InstrumentDescription
-----------------------------------------------------
GUIT2201        Fender          Fender Stratocaster
MFX3349         Zoom            Player 2100 Multi-Effects
AMP1001         Marshall        JCM 2000 Tube Super Lead

In the above, you can see that InstrumentID has an ID as well as the type (e.g. GUIT for Guitar). To resolve this, split InstrumentID to an InstrumentID as well as InstrumentType column.

Multivalued Columns

To identify a multivalued column, you can see that the data is stored in our columns with some commas, semicolons, or some other delimiter.

'Pilots' Table
PilotID     PilotFirstName      PilotLastName       Certifications
-----------------------------------------------------------------------
25100       Sam                 Alborous            727, 737, 757, MD80
25101       Jim                 Wilson              737, 747, 757
25102       David               Smith               757, MD80, DC9

In the above, the Certifications column will cause data integrity issues (e.g. difficult to update). The values in multivalued columns have a many-to-many relationship with every row in its parent table: one specific value (e.g. 737) can be associated with any number of rows in the parent table and a single row in the parent table can be associated with any number of rows in the multivalued column. In order to solve any many-to-many relationships, you need a linking table.

'Pilots' Table
PilotID     PilotFirstName      PilotLastName
---------------------------------------------
25100       Sam                 Alborous
25101       Jim                 Wilson
25102       David               Smith

'Pilot_Certifications' Table (the 'linking table')
PilotID     CertificationID
---------------------------
25100       8102
25100       8103
25100       8105
25100       8106
25101       8103
25101       8104
25101       8105

'Certifications' Table
CertificationID     TypeofAircraft
----------------------------------
8102                Boeing 727
8103                Boeing 737
8104                Boeing 747
8105                Boeing 757

So that’s how you resolve a multivalued column using a linking table.

Database Structure (Tables)

Tables are the basis for every SQL query you create. Poorly designed tables will create data integrity problems. Look out for:

Remove unnecessary Duplicate Columns

You don’t want duplicate columns because this will cause data integrity errors.

Example 1:

Staff Table
StaffID     StaffFirstName      StaffLastName
---------------------------------------------
98014       Peter               Brehm
98109       Mariya              Sergienko

Classes Table
ClassID     Class       StaffID     StaffLastName   StaffFirstName
------------------------------------------------------------------
1031        Art History 98014       Brehm           Peter
1030        Art History 98014       Brehm           Peter

Here we have duplicate columns of ‘StaffLastName’ and ‘StaffFirstName’. Remove these from Classes Table because we can use StaffID to establish the relationship (and since Staff Names belong better with the Staff Table).

Example 2:

Another example is if you have duplicate columns as different column names. For example:

'Employees' Table
EmployeeID  EmpLastName     EmpFirstName    Committee1  Committee2  Committee3
------------------------------------------------------------------------------
7004        Gehring         Darren          Steering
7005        Kennedy         John            ISO 9000    Safety

The issue is that what if there’s a Committee4 or what if you need to search through all these columns? You’ll realize that instead we should have a many-to-many relationship between employees and committees and that is solved with a linking table.

'Employees' Table
EmployeeID  EmpLastName     EmpFirstName
----------------------------------------
7004        Gehring         Darren
7005        Kennedy         John
7006        Thompson        Sarah

'Committee_Members' Table
EmployeeId      CommitteeID
---------------------------
7004            103
7005            104
7005            102
7006            102

'Committees' Table
CommitteeID     CommitteeName       MeetingRoom     MeetingDay
--------------------------------------------------------------
100             Budget              11-C            Tuesday
101             Christmas           9-F             Monday
102             Safety              12-B            Monday
103             Steering            12-D            Tuesday
104             ISO 9000            Main-South      Wednesday

Example 3:

We might also run into issues of repeating phone numbers when we have multiple home phones or work phones.

'Employees' Table
EmployeeID  EmpLastName     EmpFirstName    EmpHomePhone    EmpWorkPhone    EmpCellPhone
----------------------------------------------------------------------------------------
7004        Gehring         Darren          555-1234        556-1234
7005        Kennedy         John            555-2345
7006        Thompson        Sarah           555-3456

We can solve that with having a one-to-many relationship:

‘Employees’ Table EmployeeID EmpLastName EmpFirstName —————————————- 7004 Gehring Darren 7005 Kennedy John 7006 Thompson Sarah

'Phone_Numbers' Table
EmployeeID  PhoneID     PhoneType       PhoneNumber
---------------------------------------------------
7004        1           Home            555-1234
7005        2           Home            555-2345
7006        3           Home            555-3456

Identification is the Key

Primary Key

Every table in your database needs a primary key, which helps with:

A primary key is either:

Define a simple primary key over a composite primary key whenever you can; it’s more efficient and it’s easier to use when establishing a table relationship

To check if your primary key columns are sound:

If you don’t have column(s) for a primary key, you can create an artificial primary key (e.g. an EmployeeID that auto increments).

Establish Solid Relationships

You can learn how to diagram relationships in: Database Design for Mere Mortals

Diagram One-to-One Relationship

Employees                                 Employee_Confidential
----------------                          ---------------------
|EmployeeID  PK| -|-------------------|-  |EmployeeID    PK   |
----------------                          ---------------------

Diagram One-to-Many Relationship

Students                                  Instruments
----------------                          -------------------
|StudentID   PK| -|-------------------    |InstrumentID   PK|
----------------                     |  / |                 |
                                     |--- |StudentID      FK|
                                        \ |------------------

Diagram Many-to-Many Relationship

Pilots                                                         Certifications
---------------                                                ---------------------
|PilotID    PK| -|--|                                   |---|- |CertificationID PK |
---------------     |     Pilot_Certifications          |      ---------------------
                    |   / |----------------------|      |
                    |---- |PilotID          CPK  |      |
                        \ |                      | \    |
                          |CertificationID  CPK  | ------
                          ------------------------ /

Note: Remember that these relationships need the matchings keys to have the same data type.

So in order to make sure that your database has referential integrity, we need to make sure that there aren’t any orphaned data. There’s a few things we can do:

Establish a Deletion Rule

A deletion rule says that when a user makes a request to delete a row in the primary table, then we don’t leave any orphaned rows in the secondary table. There are two types of deletion rules:

Basically, ask yourself “If a row in my primary / one-sided table is deleted, should related rows in the secondary / many side table be deleted as well?

In our above diagrams, a (C) is usually next to cascading and (R) for restrict

Type of Participation

When you establish a relationship between a pair of tables, each table participates in a particular manner. The type of participation assigned to a table determines whether a row must exist in that table before you can enter a row into another table. These two types are:

Degree of Participation

After you determine how each table will participate in the relationship, you need to then figure out to what degree each will participate. You do this by determining the min and max number of rows in one table that can be related to a single row in the other table. This is called the table’s degree of participation.

You can see the degree of participation indicated by values like (x, y) where x can be associated with a minimum and maximum number of associations. E.g. (1, 1) means can only be associated with one and only one agent. (0, 6) means there doesn’t have to be an association, but cannot have more than six associations.

Automatic key rules

Some database systems allow you to define a rule that cascades the key value from the ‘one’ table to the ‘many’ table if a value of the primary key in the ‘one’ table changes. Other systems allow automatic deletion of rows in the ‘many’ table when you change the value of the primary key in the ‘one’ table.

SQL Standards

ANSI/ISO SQL Standard is the most widely accepted standard to date.

ODBC is an Open Database Connectivity (ODBC) specification, based off the Call-Level Interface (CLI) specification published. It’s basically a way to bind an SQL Database to a user-interface language.

Part II - Chapter 4 (Create a Simple Query)

Data vs Information

There is a distinct difference between data and information.

This will help you keep things in perspective. Remember that a database is designed to provide meaningful information to someone in your organization. However, the information can only be provided if the data is structured to support that information.

The values that you store in the database are data. This data is static (remains in the same state until you modify it by some manual or automated process). Say we have the following data:

Katherine   Ehrlich 89931   Active  79915

The above data is useless until it’s been proecssed. After you process the data so that it’s meaningful, we get information. You can present this data in a variety of ways. The main idea is that you must process your data in a way that returns meaningful information.

First Name: Katherine
Last Name: Ehrlich
ID: 89931
Status: Active
Zip: 79915

SQL SELECT

The SELECT operation in SQL retrieves data and manipulates that into information.

The SELECT operation can be broken down into three smaller operations:

  1. the SELECT statement
  2. the SELECT expression
  3. the SELECT query

Each of the above smaller operations has its own set of keywords (aka clauses).

Saving a Select

You can save a SELECT statement as a:

Results of a SELECT

When you execute a SELECT statement, it usually retrieves one or more rows of information, which we call a result set.

So what happens when you request information from the database?

Building your SELECT Statement

SQL Sorting / Ordering

By default, the result set of a SQL statement is unordered. You can use the ORDER BY clause to specify the sequence of rows in the final result set.

Chapter 5 - Getting More Than Simple Columns

Expressions

If you want to select more than simple columns, you’ll need to create an expression, which is some form of operation involving numbers, strings, or dates and times. Expressions are used to broaden or limit the scope of the information you want to retrieve.

Data Types

Every column in the database has an assigned data type, which determines the type of values the column can store. There are seven general categories of types of data, which include:

CAST function

The CAST function converts a literal value or the value of a column into a specific data type. This helps ensure that the data types of the values in the expression are compatible.

When you do convert a value in one column into another, make sure you:

Types of Literals

Examples:

CAST('2019-01-21' AS DATE)
CAST('03:30:25' AS TIME)
CAST('2019-01-22 14:25:00' AS DATETIME)

Types of Expressions

There are three types of SQL Expressions:

AS

Most calculated columns require a name. You an use the AS keyword to supply an alias for a real column name.

SELECT expression AS my_column_name

SELECT FirstName || ' ' || LastName AS MyName FROM People

NULL

A NULL represents a missing or unknown value. Null does NOT represent a zero, a character string of one or more blank spaces, or a zero-length character string.

A Null might occur legitimately for a number reasons, including:

Problem with Nulls

Nulls do not work nicely with mathematical operations. Any operation involving a Null evaluates to Null. It kinda makes sense; if a number is unknown, then the result of that operation is unknown.

Examples:

(Null * 3) + 4 = Null

Chapter 6 - Filtering your Data

WHERE

Use a WHERE clause in a SELECT statement to filter the data. You can use basic predicates like:

ESCAPE

The ESCAPE option allows you to designate a single character string literal to be the escape character. It tells the database how to interpret a percent sign or an underscore character.

SELECT ProductName, ProductCode
FROM Products
WHERE ProductCode LIKE 'G\_00' ESCAPE '\'

NOT

Exclude rows from a result set by using the NOT operator.

SELECT StaffID, Title
FROM Faculty
WHERE Title
NOT IN ('Professor', Associate Professor')

You can apply NOT before a search condition, e.g.

SELECT FirstName, LastName, City
FROM People
WHERE NOT City = 'Reedley'

AND and OR

You can combine two or more conditions using the AND operator. All conditions must be met in order for a row to be included in a result set.

You can combine two or more conditions using the OR__operator. __Either conditions must be met in order for a row to be included in a result set.

SQL Order of Precedence

Use parenthesis to combine and prioritize certain conditions. Otherwise, the evaluation order is:

Evaluation Order    |   Type of Operator
1                   |   Positive sign (+), Negative sign (-)
2                   |   Multiplication (*), division (/)
3                   |   Addition (+), subtraction (-)
4                   |   =, <>, <, >, <=, >=, BETWEEN, IN, LIKE
5                   |   NOT
6                   |   AND
7                   |   OR

Part III - Working with Multiple Tables

Set

What is a set? Each table in your database is a set of information about one subject. A set of data can be as small as one column from one row in one table. There can even be empty sets.

Each row in a result set is a member of the set. The values in the columns are specific attributes of each member.

We use result sets of information to solve more complex problems that require linking data from two or more tables.

Set Operations

The three most common set operations are:

Think of a Venn diagram (aka Euler diagram).

Intersection

One of the limitations of using a pure intersection is that the values must match in all the columns in each result set. If you only want to match on one or only a few column values from each set, then you will need a JOIN.

Use the INTERSECT keyword to do set intersection and find the common values

Example:

E.g. Show me orders with a specific product number that has a Quantity greater than zero.

SELECT DISTINCT OrderNumber
FROM Order_Details
WHERE ProductNumber IN (1, 2, 6, 11)
INTERSECT
SELECT DISTINCT OrderNumber
FROM Supplier_Details
WHERE Quantity > 0;

Difference

Remove from the first set all the matching members you find in the second set, and the result is the difference.

Use the EXCEPT keyword to do set difference and find the not common values

E.g. Show me the orders that contain a bike but not a helmet.

SELECT DISTINCT OrderNumber
FROM Order_Details
WHERE ProductNumber IN (1, 2, 6, 11)
  AND ProductNumber NOT IN (10, 25, 26)

#### Union

A __UNION__ lets you select the _rows_ from two or more similar result sets and combine them into a single
result set. Notice that we said _rows_ and not columns.

If you run a __UNION__, you combine two sets of similar information into one set. Duplicates are lost.
If you want to keep duplicates, use a __UNION ALL__.

In order to run a UNION, you need to make sure that:

* Each of the two SELECT statements that you are linking with a UNION must have the same number of output columns
* Each corresponding column must be 'comparable'.

UNION and UNION ALL combines the result of two or more SELECT statements (with each statement having the same number of columns, data types, and order).

* UNION selects distinct values only.
* UNION ALL allows duplicates.

```SQL
SELECT column_name(s) FROM table1 UNION SELECT column_name(s) FROM table2;

SELECT column_name(s) FROM table1 UNION ALL SELECT column_name(s) FROM table2;

Chapter 8 - INNER JOINs

JOINS

Use JOIN to link multiple tables together. A Join is like an INTERSECT, except instead of matching on all the columns, you specify the columns you want to join on.

You specify a JOIN as part of the FROM clause in a SQL statement. A JOIN defines a ‘logical table’ that is the result of linking two tables or results sets.

Since we are joining multiple tables that might have the same column name, we need to provide a column reference that includes the table name. For example, here is a single table select that includes the table name:

SELECT Employees.FirstName, Employees.LastName, Employees.PhoneNumber
FROM Employees

To solve the JOIN, the database system combines every row in the first table with every row in the second table; this combination of all rows from one table with every row in the second talbe is called a Cartesian product.

Correlation Name

You can create a correlation name to any table you list in your FROM clause. Follow the table name with the optional keyword AS and then the correlation name you want to assign.

This can be confusing because you usually write the SELECT clause before you write the FROM clause. If you plan to give a table an alias in the FROM clause, you have to use that alias when you qualify columnn names in the SELECT clause.

Example:

SELECT R.RecipeTitle, R.Preparation, RC.RecipeClassDescription
FROM Recipe_Classes AS RC
  INNER JOIN Recipes AS R
ON RC.RecipeClassID = R.RecipeClassID

INNER JOIN

INNER JOIN is the default join. It is the same as saying JOIN, but say INNER to be explicit. Examples:

Displaying bowling teams and the name of each team captain Return only students that have registered for a class and classes for which a student has registered

Note: Remember that when you join, provide a column reference that includes the table name. E.g.

SELECT Employees.FirstName, Employees.LastName, Employees.PhoneNumber
FROM Employees

Derived Tables

An embedded SELECT statement is a derived table, meaning you can substitute an entire SELECT statement for any table name in your FROM clause. What we’re doing is deriving a subset of data from one or more tables. You need to assign a correlation name so that the result of evaluating your embedded query has a name.

SELECT R.RecipeTitle, R.Preparation, RCFiltered.ClassName
FROM
  (SELECT RecipeClassID, RecipeClassDescription AS ClassName
   FROM Recipe_Classes AS RC
   WHERE RC.ClassName = 'Main course' OR
    RC.ClassName = 'Dessert') AS RCFiltered
INNER JOIN Recipes AS R
  ON RCFiltered.RecipeClassID = R.RecipeClassID

If your database does not let you embed a SELECT statement inside a FROM clause, consider using a VIEW.

Chapter 9 - OUTER JOINs

Examples:

List all the classes and the students who have registered for those classes

Use an OUTER JOIN with a test for Null values is an alternate way of discovering the difference between two sets. Examples: The rows with a Null value in the columns from the Classes table represent the difference between the set of all students and the set of students who have registered for a class

SELECT column_name(s) FROM table1 FULL OUTER JOIN table2 ON table1.column_name=table2.column_name;

LEFT / RIGHT OUTER JOIN

The first table you name is the one on the ‘left’. The second table you name is the one on the ‘right’.

An outer join returns the unmatched rows from either the ‘left’ or ‘right’ table.

SELECT Recipe_Classes.RecipeClassDescription, Recipes.RecipeTitle
FROM Recipe_Classes
LEFT OUTER JOIN Recipes
ON Recipe_Classes.RecipeClassID = Recipes.RecipeClassID

JOIN WITH Nulls

List the recipe classes that do not yet have any recipes.

SELECT Recipe_Classes.RecipeClassDescription
FROM Recipe_Classes
LEFT OUTER JOIN Recipes
 ON Recipe_Classes.RecipeClassID =
Recipes.RecipeClassID
WHERE Recipes.RecipeID IS NULL

This results in doing a difference or EXCEPT operation using a JOIN. We get back ‘all the Recipe_Classes except the ones that already appear in the Recipes table’. We created an EXCEPT!

FULL OUTER JOIN

The FULL OUTER JOIN includes all the rows from both of the tables. When no matching rows exist for rows on the ‘left’ side of the JOIN, you see Null values from the result set on the ‘right’.

Chapter 11 - Subqueries

A subquery is a SELECT expression that you embed inside one of the clauses of a SELECT statement to form your final query statement.

Why would you need this? You can build complex filters that do not rely on the tables in your FROM clause. Using a subquery in a WHERE clause is the only way to get the correct number of rows in your answer when you want rows from one table based on the filtered contents from other related tables.

There are three different types of subqueries:

You can use subqueries to generate an output column or to perform a complex comparision in a WHERE clause. Subqueries can often be replaced more effectively with a join, but are often used to fetch the results of a function calculation (aggregate). E.g.

Row Value Constructor

You can use a row subquery to build a row value constructor. What happens is that when you create a WHERE clause, you build a search condition that is typically some sort of comparison of one column from one of your tables with another column or a literal. However, you can build a search condition that compares multiple values as a logical row with another set of values as a logical row (two row value constructors).

Example

SELECT SKUClass, SKUNumber, ProductName
FROM Products
WHERE
(SKUClass, SKUNumber)
>= ('DSK', 9775)

This is equivalent to:

SELECT SKUClass, SKUNumber, ProductName
FROM Products
WHERE
(SKUClass > 'DSK')
OR ((SKUClass = 'DSK')
 AND (SKUNumber >= 9775))

Scalar Subqueries

A scalar subquery lets you fetch a single column or calculated expression from another table that does not have to be in the FROM clause of the main query. You can use this single value fetched by a scalar subquery in the list of columns you request in a SELECT clause or you can use it as a comparison value in a WHERE clause.

So what is happening? You are substituting the subquery where you would normally enter a single column name or expression that results in a single column. That is the reason why a scalar subquery needs to return exactly one column and no more than one row.

This lets you pluck a single value from some other table or query to include in the output of your query.

Example 1:

Subquery is in the SELECT

SELECT Orders.OrderNumber, Orders.OrderDate, Orders.ShipDate,
 (SELECT Customers.CustLastName FROM Customers
  WHERE Customers.CustomerID = Orders.CustomerID)
FROM Orders
WHERE Orders.ShipDate = '2017-10-03'

Example 2:

Subquery is the WHERE clause

SELECT name, listed_price
FROM paintings
WHERE listed_price > (
    SELECT AVG(listed_price)
    FROM paintings
);

Special Predicate Keywords for Subqueries

There are special predicate keywords for use in a WHERE clause with a subquery, including:

Table or Multirow Subqueries

Example 1 (subquery returning a list):

The inner query returns a list of all manager IDs. The outer query filters only those sales agents who are not in the managers list and calculates an average agency fee. The query returns a single value.

SELECT AVG(agency_fee)
FROM sales_agents
WHERE id NOT IN (SELECT id FROM managers);

Example 2 (subquery returning a table):

When a subquery returns a table with multiple rows and multiple columns, that subquery is usually found in the FROM or JOIN clause. This allows you to get a table with data that was not readily available in the database (e.g. grouped data) and then join this table with another one from your database.

Say we wanted to see the total amount of sales for each artist who sold at least one painting in our gallery. Our inner query calculates the sale for each artist from the sales table. Our outer query combines this information with the artists’ first names and last names to get the info we want.

SELECT
    artists.first_name,
    artists.last_name,
    artists_sales.sales
FROM artists
JOIN (
    SELECT artist_id, SUM(sales_price) AS sales
    FROM sales
    GROUP BY artist_id
    ) as artist_sales
  ON artists.id = artist_sales.artist_id;


first_name, last_name, sales
William, Liu, 344
Kate, Wein, 3454

Row Subquery

A row subquery returns a single row. These are often used in comparison/correlations (aka correlated subquery, synchronized subquery). These are often slow because the inner subquery might be evaluated once for each row processed by the outer query. Correlated subqueries are commonly used in the SELECT, WHERE, and FROM statements.

Example 1 (correlated subquery that returns a scalar value)

We want to calculate the number of paintings found in each of our galleries. The inner query returns a scalar value with the total number of paintings from the corresponding gallery. Note that the inner query depends on the outer query. We pull in the gallery_id from the galleries table (located on the outer query). You cannot run the inner query by itself.

SELECT city,
    (SELECT COUNT(*)
     FROM paintings p
     WHERE g.id = p.gallery_id) total_paintings
FROM galleries g;

# city, total paintings
# London, 2
# New York, 2
# Munich, 1

The equivalent of a join (note: joins are usually faster than subqueries) would be:

SELECT g.city, count(p.name) AS total_paintings
FROM galleries g
JOIN paintings p
ON g.id = p.gallery_id
GROUP BY g.city;

Example 2 (correlated subquery in the WHERE statement)

Say we want to get information about sales agents whose agency fees are higher than the average fee for their gallery. The inner query returns the average agency fee for the gallery of the sales agent. The outer query returns the information about only those sale agents who satisfy the condition included in the WHERE statement (i.e. an agency fee greater than their gallery average). Notice that the subquery is a correlated subquery because it can’t be run independently of the outer query.

SELECT
    last_name,
    first_name,
    agency_fee
FROM sales_agents sa1
WHERE sa1.agency_fee > (SELECT AVG(agency_fee)
                        FROM sales_agents sa2
                        WHERE sa2.gallery_id = sa1.gallery_id);

Set Membership: IN

Select the recipe title from the recipes where the recipe ID is in the recipe ingredients from the inner joined with the ingredient classes that have a recipe class of ‘Seafood’.

SELECT RecipeTitle
FROM Recipes
WHERE Recipes.RecipeID IN
  (SELECT RecipeID
   FROM Recipe_Ingredients
   WHERE Recipie_Ingredients.IngredientID IN
   (SELECT IngredientID
    FROM Ingredients
    INNER JOIN Ingredient_Classes
      ON Ingredients.IngredientClassID = Ingredient_Classes.IngredientClassID
    WHERE Ingredient_classes.IngredientClassDescription = 'Seafood'
   )
  )

Notice how we do not want to use a complex JOIN in the outer query since we might get duplicates (more than one row per recipe that has more than one seafood ingredient). You can use a DISTINCT, but your database will have to do more work then. DISTINCT to remove

Quantified Predicates (ALL, SOME, ANY)

You have to SELECT as a table subquery that returns exactly one column and zero or more rows. If the subquery returns more than one row, the values in the row make up a list.

If the subquery returns no rows, then any comparison predicate with the ALL keyword is true, any with the SOME or ANY keyword is false.

SELECT Recipes.RecipeTitle
FROM Recipes
WHERE Recipes.RecipeID IN
  (SELECT Recipe_Ingredients.RecipeID
   FROM Recipe_Ingredients
   WHERE Recipe_Ingredients.IngredientID = ANY
   (SELECT Ingredients.IngredientID
    FROM Ingredients
    WHERE Ingredients.IngredientName
    IN ('Beef', 'Garlic')
   )
  )

Existence: EXISTS

Use EXISTS to check if a related row exists in the result set returned by a subquery.

E.g. Find all the customers who ordered a bicycle

SELECT Customers.CustomerID, Customers.CustFirstName, Customers.CustLastName
FROM Customers
WHERE EXISTS
  (SELECT *
   FROM (Orders
         INNER JOIN Order_Details
           ON Orders.OrderNumber = Order_Details.OrderNumber
        )
   INNER JOIN Products
     ON Products.ProductNumber = Order_Details.ProductNumber
   WHERE Products.CategoryID = 2
     AND Orders.CustomerID = Customers.CustomerID)

Subqueries as Filters

HAVING

Use a HAVING clause to filter out groups of information from a subquery.

Part IV - Summarizing and Grouping Data

Chapter 12 - Simple Totals

Aggregates have a simple syntax.

COUNT() <DISTINCT> expression AS alias
SUM
AVG
MAX
MIN

Each aggregate calculates a single value from the rows in a result set.

Except COUNT(*), all aggregate functions automatically disregard Null values.

Aggregate Functions

Standard SQL defines many functions that calculate values in a query. A subclass of functions are aggregate functions, which lets you calculate a single value for a group of rows in a result set. You can use an aggregate function to count the rows, find the largest or smallest value within a set of rows, or calculcate the average or total of some value or expression across a result set.

COUNT

Use COUNT to determine the number of rows or the number of non-Null values in a result set. Use COUNT(*) to find out how many rows are in the entire set. Note: this is the only aggregate function to count Nulls. Use COUNT(my_column_name) to count the number of rows with non-Null values in that column. Use COUNT DISTINCT to count only the unique values.

e.g. Select the number of students

SELECT COUNT(*) AS CountOfStudents
FROM Students

e.g. Count the number of non-Null county values

SELECT COUNT(CustCounty) AS NumberOfKnownCounties
FROM Customers

e.g. How many unique county names are there in the customers table

SELECT COUNT(DISTINCT CustCounty) AS NumberOfUniqueCounties
FROM Customers

Using Aggregates and subqueries

E.g. “List the name of every bowler whose personal average score is greater than the overall average score”


SELECT Bowlers.BowlerLastName, Bowlers.BowlerFistName
FROM Bowlers
WHERE (SELECT AVG(RawScore)
FROM Bowler_Scores AS BS
WHERE BS.BowlerID = Bowlers.BowlerID)
> (SELECT AVG(RawScore) FROM Bowler_Scores)
ORDER BY Bowlers.BowlerLastName, Bowlers.BowlerFirstName

Chapter 13 - Grouping Data

Grouping data is usually done with aggregates

E.g. “Show me each entertainment group’s name, count of contracts for the group, total price of contracts”

SELECT Entertainers.EntStageName,
  COUNT(*) AS NumContracts,
  SUM(Engagements.ContractPrice) AS TotalPrice
FROM Entertainers
INNER JOIN Engagements
ON Entertainers.EntertainerID = Engagements.EntertainerID
GROUP BY Entertainers.EntStageName

GROUP BY

Get a subtotal across groups using aggregate functions and GROUP BY.

Chapter 14 - Filtering Grouped Data

HAVING

Use HAVING after a GROUP BY to filter out additional data. Note that you can filter on WHERE or HAVING.

Chapter 15 - Modifying Sets of Data

UPDATE