‘SQL Queries for Mere Mortals’ shows you how to write queries using a variety of databases, including SQL Server, MySQL, and PostgreSQL
The book is broken into six sections:
An Entity Relationship Diagram (aka ERD, ER Diagram, ER Model) is a structural diagram for use in database design. An ERD contains two different symbols and connectors that visualize two important information: the major entities within the system scope and the inter-relationships among these entities.
Get the code samples here: http://www.informit.com/store/sql-queries-for-mere-mortals-a-hands-on-guide-to-data-9780134858333
A database schema is the skeleton structure that represents the logical view of the entire database. If defines how the data is organized and how the relations among them are associated. It formulates all the constraints that are to be applied on the data.
Here are some example Schemas we’ll be using from the book:
This database tracks bowling teams, team members, the matches they played, and the results.
Table Bowler_Scores
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
MatchID int 4 Yes 0 Yes Match_Games.MatchID
GameNumber smallint 2 Yes 0 Yes Match_Games.GameNumber
BowlerID int 4 Yes 0 Yes Bowlers.BowlerID
RawScore smallint 2 0
HandiCapScore smallint 2 0
WonGame bit 1
Indexes for Bowler_Scores
Index Name Attributes/Field Names
--------------------------------------------------
BowlersBowler_Scores
BowlerID
Match_GamesBowler_Scores
MatchID
GameNumber
PrimaryKey Primary
MatchID
GameNumber
BowlerID
Table Bowlers
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
BowlerID int 4 Yes 0 Yes
BowlerLastName nvarchar 50
BowlerFirstName nvarchar 50
BowlerMiddleInit nvarchar 1
BowlerAddress nvarchar 50
BowlerCity nvarchar 50
BowlerState nvarchar 2
BowlerZip nvarchar 10
BowlerPhoneNumber nvarchar 14
TeamID int 4 Teams.TeamID
Indexes for Bowlers
Index Name Attributes/Field Names
--------------------------------------------------
BowlerLastName
BowlerLastName
PrimaryKey Primary
BowlerID
TeamBowlers
TeamID
Table Match_Games
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
MatchID int 4 Yes 0 Yes Tourney_Matches.MatchID
GameNumber smallint 2 Yes 0 Yes
WinningTeamID int 4 0
Indexes for Match_Games
Index Name Attributes/Field Names
--------------------------------------------------
PrimaryKey Primary
MatchID
GameNumber
Tourney_MatchesMatch_Games
MatchID
Table Teams
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
TeamID int 4 Yes Yes
TeamName nvarchar 50 Yes
CaptainID int 4 Bowlers.BowlerID
Indexes for Teams
Index Name Attributes/Field Names
--------------------------------------------------
BowlersTeams Unique
CaptainID
CaptainID Unique
CaptainID
PrimaryKey Primary
TeamID
TeamID Unique
TeamID
Table Tournaments
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
TourneyID int 4 Yes 0 Yes
TourneyDate date 8
TourneyLocation nvarchar 50
Indexes for Tournaments
Index Name Attributes/Field Names
--------------------------------------------------
PrimaryKey Primary
TourneyID
Table Tourney_Matches
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
MatchID int 4 Yes 0 Yes
TourneyID int 4 0 Tournaments.TourneyID
Lanes nvarchar 5
OddLaneTeamID int 4 0 Teams.TeamID
EvenLaneTeamID int 4 0 Teams.TeamID
Indexes for Tourney_Matches
Index Name Attributes/Field Names
--------------------------------------------------
PrimaryKey Primary
MatchID
TeamsTourney_Matches
OddLaneTeamID
TeamsTourney_Matches1
EvenLaneTeamID
TournamentsTourney_Matches
TourneyID
Table ztblBowlerRatings
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
BowlerRating nvarchar 15 Yes
BowlerLowAvg smallint 2
BowlerHighAvg smallint 2
Indexes for ztblBowlerRatings
Index Name Attributes/Field Names
--------------------------------------------------
PrimaryKey Primary
BowlerRating
Table ztblSkipLabels
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
LabelCount int 4 Yes Yes
Indexes for ztblSkipLabels
Index Name Attributes/Field Names
--------------------------------------------------
PrimaryKey Primary
LabelCount
Table ztblWeeks
Field Data Type Size Required Default Primary Key Foreign Key
----------------------------------------------------------------------------------------------------------
WeekStart date 8 Yes Yes
WeekEnd date 8
Indexes for ztblWeeks
Index Name Attributes/Field Names
--------------------------------------------------
PrimaryKey Primary
WeekStart
Name Parent Parent Fields Child Child Fields
------------------------------------------------------------------------------------------------
BowlersBowler_Scores Bowlers BowlerID Bowler_Scores BowlerID
BowlersTeams Bowlers BowlerID Teams CaptainID
Match_GamesBowler_Scores Match_Games MatchID_GameNumber Bowler_Scores MatchID_GameNumber
TeamsBowlers Teams TeamID Bowlers TeamID
TeamsTourney_Matches Teams TeamID Tourney_Matches OddLaneTeamID
TeamsTourney_Matches1 Teams TeamID Tourney_Matches EvenLaneTeamID
A database is an organized collection of data used to model some type of organization or organizational process.
There are two types of databases:
A relational database management system (RDBMS) is a software application program used to create, maintain, modify, and manipulate a relational database.
A data warehouse is a large store of data accumulated from a wide range of sources and would enable organizations to access data stored in any number of nonrelational databases.
Data in a relational database is stored in relations, which appear as tables. Each relation is made up of tuples (records of rows) and attributes (fields or columns).
Tables are the main structures in the database. Each table represents a single, specific object. Each table contains at least one column that uniquely identifies each of its rows (aka primary key).
The subject that a given table represents is usually either an object or an event.
When the subject of a table is an object, the table represents something tangible; a person, place, or thing. Example objects are Students, Buildings, Equipment, Customers
CustomemrID FirstName LastName StreetAddress City State ZipCode
---------------------------------------------------------------------------------
1010 Angel Kennedy 667 Red Rd Austin TX 78710
1011 Alaina Hallmark 2114 Longview San Diego CA 92199
When the subject of a table is an event, the table represents something that occurs at a given point in time and has characteristics that you wish to record. Example events are Judicial Hearings, Lab Test Results, Patient Visits, and Geological Surveys.
PatientID VisitDate VisitTime Physician BloodPressure Temperature
---------------------------------------------------------------------------
92001 2006-05-01 10:30 Ehrlich 120/80 98.8
96106 2006-05-02 11:00 Hallmark 160/90 99.1
A column is the smallest structure in the database. It represents a characteristic of the subject of the table. Every column in a properly designed database contains one and only one value. The name of the column identifies the type of value it holds, that way entering data is intuitive. Example column names might be FirstName, LastName, City, State, ZipCode
A row represents a unique instance of the subject of a table. It is composed of the entire set of columns in a table, regardless of whether or not the columns contain any values. Because of how a table is defined, each row is identified throughout the database by a unique value in the primary key column(s) of that row.
Keys are special columns that play very specific roles in a table; depending on the type of key, there’s a different purpose. The two most important ones help describe the relationships between tables using the primary key and the foreign key.
A primary key is one or more columns that uniquely identify each row within a table. When a primary key is made up of two or more columns, it is known as a composite primary key.
A primary key is important for two reasons:
Primary keys also enforce table-level integrity and help establish relationships with other tables. Every table should have a primary key.
Here’s an example:
'Agents' Table
AgentID AgentFirstName AgentLastName DateHired AgentHomePhone
--------------------------------------------------------------------------
1 William Thompson 15-May-01 555-2681
2 Scott Bishop 10-Feb-03 555-2666
3 Carol Viescas 09-Sep-00 555-2571
'Entertainers' Table
EntertainerID AgentID EntertainerName EntertainerPhone
----------------------------------------------------------------
1001 1 Carol Peacock Trio 555-2691
1002 3 Topazz 555-2591
1003 3 JV & the Deep Six 555-2511
In the ‘Agents’ table, we have the Primary Key of ‘AgentID’. In the ‘Entertainers’ table, we have the Primary Key of ‘EntertainerID’ and Foreign Key of ‘AgentID’ that references ‘Agents’ table ‘AgentID’.
When you determine that a pair of tables have a relationship to each other, you establish the relationship by taking a copy of the primary key from the first table and insert it into the second table, where it becomes the foreign key.
The term foreign key comes from the fact that the second table already has a primary key of its own, and the primary key you are introducing from the first table is foreign to the second table.
Foreign keys are important for establishing relationships between pairs of tables and also for ensuring relationship-level integrity. This means that the rows in both tables will always be properly related because the values of a foreign key MUST be drawn from the values of the primary key from which it refers (i.e. helps you avoid ‘orphaned rows’)
A view is a virtual table composed of columns from one or more tables in the database. The tables that comprise the view are known as base tables. The view is virtual because it draws data from base tables rather than storing any data on its own.
If rows in a given table can be associated in some way with rows in another table, the tables have a relationship between them. There’s three types of relationships that can exist between a pair of tables:
A pair of tables has a one-to-one relationship when a single row in the first table is related to only one row in the second table AND a single row in the second table is related to only one row in the first table.
The first table is usually referred to as the primary table. The second table is usually referred to as the secondary table.
'Agents' Table (the primary table)
AgentID AgentFirstName AgentLastName DateHired AgentHomePhone
--------------------------------------------------------------------------
1 William Thompson 15-May-01 555-2681
2 Scott Bishop 10-Feb-03 555-2666
3 Carol Viescas 09-Sep-00 555-2571
'Compenstation' Table (the secondary table)
AgentID Salary CommissionRate
--------------------------------------
1 $35,000 4.00%
2 $27,000 4.00%
3 $30,000 5.00%
‘Compensation’ is the secondary table because it doesn’t make sense to add a salary without having an associated ‘Agent’. You cannot add a row to the secondary table (‘Compensation’) unless a row exists in the primary table. These one-to-one types of tables are uncommon. You might split this data into two parts for confidentiality purposes.
When a pair of tables has a one-to-many relationship, a single row in the first table can be related to many rows in the second table, but a single row in the second table can only be related to only one row in the first table.
So how does this work? You take the primary key of the primary table and insert it into the secondary table as a foreign key.
'Entertainers' Table
EntertainerID AgentID EntertainerName EntertainerPhone
----------------------------------------------------------------
1001 1 Carol Peacock Trio 555-2691
1002 3 Topazz 555-2591
1003 3 JV & the Deep Six 555-2511
'Engagements' Table
EngagementID EntertainerID CustomerID StartDate EndDate
------------------------------------------------------------------
5 1003 10006 2007-09-11 2007-09-14
7 1002 10004 2007-09-11 2007-09-19
10 1003 10005 2007-09-17 2007-09-26
12 1001 10014 2007-09-18 2007-09-26
In the above example, a single row in the Entertainers table can be related to many rows in the Engagements table, but a single row in the Engagements table can be related to only one row in the Entertainers table. EntertainerID is a foreign key in the Engagements table.
A pair of tables is a many-to-many relationship when a single row in the first table can be related to many rows in the second table, and a single row in the second table can be related to many rows in the first table.
In order to establish this many-to-many relationship correctly, you must create a linking table, which provides an easy way to associate rows from one table with those of the other. You define a linking table by taking a copy of the primary key of each table and using them to form the structure of the new table. Together, they form the composite primary key of the linking table, and separately they each serve as a foreign key.
Example:
'Customers' Table
CustomerID CustFirstName CustLastName CustPhone
10001 Doris Hartwig 555-2671
10002 Deb Waldal 555-2496
10003 Peter Brehm 555-2501
'Entertainers' Table
EntertainerID AgentID EntertainerName EntertainerPhone
----------------------------------------------------------------
1001 1 Carol Peacock Trio 555-2691
1002 3 Topazz 555-2591
1003 3 JV & the Deep Six 555-2511
The above many-to-many relationship is unresolved because the many-to-many relationship has not been properly established. The issue is: How do you associate rows from the first table to the second table? The solution is to create and use a linking table.
Engagements Table (as a 'linking table')
EngagementID CustomerID EntertainerID StartDate
---------------------------------------------------------
43 10001 1001 2007-10-21
58 10001 1002 2007-12-01
62 10003 1005 2007-12-09
71 10002 1003 2007-12-22
125 10001 1003 2008-02-23
In the above, we create a new table ‘Engagements’ as a linking table. Here you can store additional information in the linking table like the StartDate or Cost.
Database Theory is the principles and rules that formulate the basis of the relational database model. Database theory guarantees that the relational database is structurally sound and that all actions taken on the data in the database have predictable results.
Database Design is the structured, organized set of processes used to design a relational database. A good database design methodology helps ensure the integrity, consistency, and accuracy of the data in the database. Consider books like ‘Database Design for Mere Mortals’ or ‘An Introduction to Database Systems’.
In order to make sure that your database structure is sound, let’s take a look at:
Since columns are our smallest data structure, let’s look at them first:
Multipart columns occur where you can say yes to the question: “Can I take the current value of this column and break it up into smaller, more distinct parts?”. Here’s an example:
CustomerID CustomerName StreetAddress
----------------------------------------------------------------------------
1001 Suzanne Viescas 15127 NE 24th, #383 Redmond, WA 98052
1002 William Thompson 122 Spring River Drive, Duvall, WA 98019
1003 Gary Hallmark Route 2, Box 203B, Auburn, WA 98002
In the above, you can see that columns CustomerName
and StreetAddress
can be broken up into
more distinct columns. CustomerName can be CustFirstName and CustLastName while StreetAddress can
be broken up into CustAddress, CustCity, CustZipCode, CustState, etc.
More complicated multipart examples might be:
'Instruments' Table
InstrumentID Manufacturer InstrumentDescription
-----------------------------------------------------
GUIT2201 Fender Fender Stratocaster
MFX3349 Zoom Player 2100 Multi-Effects
AMP1001 Marshall JCM 2000 Tube Super Lead
In the above, you can see that InstrumentID
has an ID as well as the type (e.g. GUIT for Guitar).
To resolve this, split InstrumentID to an InstrumentID as well as InstrumentType column.
To identify a multivalued column, you can see that the data is stored in our columns with some commas, semicolons, or some other delimiter.
'Pilots' Table
PilotID PilotFirstName PilotLastName Certifications
-----------------------------------------------------------------------
25100 Sam Alborous 727, 737, 757, MD80
25101 Jim Wilson 737, 747, 757
25102 David Smith 757, MD80, DC9
In the above, the Certifications
column will cause data integrity issues (e.g. difficult to
update). The values in multivalued columns have a many-to-many relationship with every row
in its parent table: one specific value (e.g. 737
) can be associated with any number of rows
in the parent table and a single row in the parent table can be associated with any number of
rows in the multivalued column. In order to solve any many-to-many relationships, you need a
linking table.
'Pilots' Table
PilotID PilotFirstName PilotLastName
---------------------------------------------
25100 Sam Alborous
25101 Jim Wilson
25102 David Smith
'Pilot_Certifications' Table (the 'linking table')
PilotID CertificationID
---------------------------
25100 8102
25100 8103
25100 8105
25100 8106
25101 8103
25101 8104
25101 8105
'Certifications' Table
CertificationID TypeofAircraft
----------------------------------
8102 Boeing 727
8103 Boeing 737
8104 Boeing 747
8105 Boeing 757
So that’s how you resolve a multivalued column using a linking table.
Tables are the basis for every SQL query you create. Poorly designed tables will create data integrity problems. Look out for:
Employee_Record
You don’t want duplicate columns because this will cause data integrity errors.
Example 1:
Staff Table
StaffID StaffFirstName StaffLastName
---------------------------------------------
98014 Peter Brehm
98109 Mariya Sergienko
Classes Table
ClassID Class StaffID StaffLastName StaffFirstName
------------------------------------------------------------------
1031 Art History 98014 Brehm Peter
1030 Art History 98014 Brehm Peter
Here we have duplicate columns of ‘StaffLastName’ and ‘StaffFirstName’. Remove these from Classes Table because we can use StaffID to establish the relationship (and since Staff Names belong better with the Staff Table).
Example 2:
Another example is if you have duplicate columns as different column names. For example:
'Employees' Table
EmployeeID EmpLastName EmpFirstName Committee1 Committee2 Committee3
------------------------------------------------------------------------------
7004 Gehring Darren Steering
7005 Kennedy John ISO 9000 Safety
The issue is that what if there’s a Committee4 or what if you need to search through all these columns? You’ll realize that instead we should have a many-to-many relationship between employees and committees and that is solved with a linking table.
'Employees' Table
EmployeeID EmpLastName EmpFirstName
----------------------------------------
7004 Gehring Darren
7005 Kennedy John
7006 Thompson Sarah
'Committee_Members' Table
EmployeeId CommitteeID
---------------------------
7004 103
7005 104
7005 102
7006 102
'Committees' Table
CommitteeID CommitteeName MeetingRoom MeetingDay
--------------------------------------------------------------
100 Budget 11-C Tuesday
101 Christmas 9-F Monday
102 Safety 12-B Monday
103 Steering 12-D Tuesday
104 ISO 9000 Main-South Wednesday
Example 3:
We might also run into issues of repeating phone numbers when we have multiple home phones or work phones.
'Employees' Table
EmployeeID EmpLastName EmpFirstName EmpHomePhone EmpWorkPhone EmpCellPhone
----------------------------------------------------------------------------------------
7004 Gehring Darren 555-1234 556-1234
7005 Kennedy John 555-2345
7006 Thompson Sarah 555-3456
We can solve that with having a one-to-many relationship:
‘Employees’ Table EmployeeID EmpLastName EmpFirstName —————————————- 7004 Gehring Darren 7005 Kennedy John 7006 Thompson Sarah
'Phone_Numbers' Table
EmployeeID PhoneID PhoneType PhoneNumber
---------------------------------------------------
7004 1 Home 555-1234
7005 2 Home 555-2345
7006 3 Home 555-3456
Every table in your database needs a primary key, which helps with:
A primary key is either:
Define a simple primary key over a composite primary key whenever you can; it’s more efficient and it’s easier to use when establishing a table relationship
To check if your primary key columns are sound:
If you don’t have column(s) for a primary key, you can create an artificial primary key (e.g. an EmployeeID that auto increments).
You can learn how to diagram relationships in: Database Design for Mere Mortals
Diagram One-to-One Relationship
Employees Employee_Confidential
---------------- ---------------------
|EmployeeID PK| -|-------------------|- |EmployeeID PK |
---------------- ---------------------
Diagram One-to-Many Relationship
Students Instruments
---------------- -------------------
|StudentID PK| -|------------------- |InstrumentID PK|
---------------- | / | |
|--- |StudentID FK|
\ |------------------
Diagram Many-to-Many Relationship
Pilots Certifications
--------------- ---------------------
|PilotID PK| -|--| |---|- |CertificationID PK |
--------------- | Pilot_Certifications | ---------------------
| / |----------------------| |
|---- |PilotID CPK | |
\ | | \ |
|CertificationID CPK | ------
------------------------ /
Note: Remember that these relationships need the matchings keys to have the same data type.
So in order to make sure that your database has referential integrity, we need to make sure that there aren’t any orphaned data. There’s a few things we can do:
A deletion rule says that when a user makes a request to delete a row in the primary table, then we don’t leave any orphaned rows in the secondary table. There are two types of deletion rules:
Basically, ask yourself “If a row in my primary / one-sided table is deleted, should related rows in the secondary / many side table be deleted as well?
In our above diagrams, a (C)
is usually next to cascading and (R)
for restrict
When you establish a relationship between a pair of tables, each table participates in a particular manner. The type of participation assigned to a table determines whether a row must exist in that table before you can enter a row into another table. These two types are:
|
in a diagram)O
in a diagram)After you determine how each table will participate in the relationship, you need to then figure out to what degree each will participate. You do this by determining the min and max number of rows in one table that can be related to a single row in the other table. This is called the table’s degree of participation.
You can see the degree of participation indicated by values like (x, y)
where x can be associated with a minimum
and maximum number of associations. E.g. (1, 1)
means can only be associated with one and only one agent.
(0, 6)
means there doesn’t have to be an association, but cannot have more than six associations.
Some database systems allow you to define a rule that cascades the key value from the ‘one’ table to the ‘many’ table if a value of the primary key in the ‘one’ table changes. Other systems allow automatic deletion of rows in the ‘many’ table when you change the value of the primary key in the ‘one’ table.
ANSI/ISO SQL Standard is the most widely accepted standard to date.
ODBC is an Open Database Connectivity (ODBC) specification, based off the Call-Level Interface (CLI) specification published. It’s basically a way to bind an SQL Database to a user-interface language.
There is a distinct difference between data and information.
This will help you keep things in perspective. Remember that a database is designed to provide meaningful information to someone in your organization. However, the information can only be provided if the data is structured to support that information.
The values that you store in the database are data. This data is static (remains in the same state until you modify it by some manual or automated process). Say we have the following data:
Katherine Ehrlich 89931 Active 79915
The above data is useless until it’s been proecssed. After you process the data so that it’s meaningful, we get information. You can present this data in a variety of ways. The main idea is that you must process your data in a way that returns meaningful information.
First Name: Katherine
Last Name: Ehrlich
ID: 89931
Status: Active
Zip: 79915
The SELECT
operation in SQL retrieves data and manipulates that into information.
The SELECT
operation can be broken down into three smaller operations:
Each of the above smaller operations has its own set of keywords (aka clauses).
You can save a SELECT statement as a:
When you execute a SELECT statement, it usually retrieves one or more rows of information, which we call a result set.
So what happens when you request information from the database?
Select <item> from the <source data>
Select city from customers
*
to specify all columns (only use this as a ‘quick and dirty’ way to get data; explicit better than implicit)DISTINCT
as an optional keyword to eliminate duplicate rows; this evaluates the values of all the columns as
a single unit on a row-by-row basis and eliminates any redundant rows it finds (otherwise shows every occurrence)
Add DISTINCT
before the list of columns specified in a SELECT
clauseBy default, the result set of a SQL statement is unordered. You can use the ORDER BY
clause to specify the sequence
of rows in the final result set.
ASC
or DSC
; default is ascendingIf you want to select more than simple columns, you’ll need to create an expression, which is some form of operation involving numbers, strings, or dates and times. Expressions are used to broaden or limit the scope of the information you want to retrieve.
Every column in the database has an assigned data type, which determines the type of values the column can store. There are seven general categories of types of data, which include:
CHAR
(fixed length characters, better performance, roughly +50%) or VARCHAR
(varying length character).
Most common is varchar
. Maximum this data type can hold is also 65,535
characters per row (and not just 255 char per column).
If larger than say 255 or 1024 chars, then use TEXT
(aka character large object, clob)
This is the one I’ve seen used most commonly (varchar
for small fields, text
for large).INT
, DECIMAL
, SMALLINT
). With approximate numeric we store numbers with decimal
places and exponential numbers that don’t have a precision and scale per se (e.g. FLOAT
, REAL
, DOUBLE PRECISION
)BIT
, INT
, TINYINT
)INTERVAL
data type. For MySQL, I usually use something like:
DATE(NOW()) - INTERVAL 1 MONTH
The CAST
function converts a literal value or the value of a column into a specific data type. This helps
ensure that the data types of the values in the expression are compatible.
When you do convert a value in one column into another, make sure you:
Examples:
CAST('2019-01-21' AS DATE)
CAST('03:30:25' AS TIME)
CAST('2019-01-22 14:25:00' AS DATETIME)
There are three types of SQL Expressions:
Concatentation - combine two or more columns into one (e.g. ‘Mike’ | ‘Liu’ will retur ‘MikeLiu’ |
Most calculated columns require a name. You an use the AS
keyword to supply an alias for a real column name.
SELECT expression AS my_column_name
SELECT FirstName || ' ' || LastName AS MyName FROM People
A NULL represents a missing or unknown value. Null does NOT represent a zero, a character string of one or more blank spaces, or a zero-length character string.
A Null might occur legitimately for a number reasons, including:
Nulls do not work nicely with mathematical operations. Any operation involving a Null evaluates to Null. It kinda makes sense; if a number is unknown, then the result of that operation is unknown.
Examples:
(Null * 3) + 4 = Null
Use a WHERE clause in a SELECT statement to filter the data. You can use basic predicates like:
The ESCAPE option allows you to designate a single character string literal to be the escape character. It tells the database how to interpret a percent sign or an underscore character.
SELECT ProductName, ProductCode
FROM Products
WHERE ProductCode LIKE 'G\_00' ESCAPE '\'
Exclude rows from a result set by using the NOT operator.
SELECT StaffID, Title
FROM Faculty
WHERE Title
NOT IN ('Professor', Associate Professor')
You can apply NOT before a search condition, e.g.
SELECT FirstName, LastName, City
FROM People
WHERE NOT City = 'Reedley'
You can combine two or more conditions using the AND operator. All conditions must be met in order for a row to be included in a result set.
You can combine two or more conditions using the OR__operator. __Either conditions must be met in order for a row to be included in a result set.
Use parenthesis to combine and prioritize certain conditions. Otherwise, the evaluation order is:
Evaluation Order | Type of Operator
1 | Positive sign (+), Negative sign (-)
2 | Multiplication (*), division (/)
3 | Addition (+), subtraction (-)
4 | =, <>, <, >, <=, >=, BETWEEN, IN, LIKE
5 | NOT
6 | AND
7 | OR
What is a set? Each table in your database is a set of information about one subject. A set of data can be as small as one column from one row in one table. There can even be empty sets.
Each row in a result set is a member of the set. The values in the columns are specific attributes of each member.
We use result sets of information to solve more complex problems that require linking data from two or more tables.
The three most common set operations are:
Think of a Venn diagram (aka Euler diagram).
One of the limitations of using a pure intersection is that the values must match in all the columns in each result set. If you only want to match on one or only a few column values from each set, then you will need a JOIN.
Use the INTERSECT keyword to do set intersection and find the common values
Example:
E.g. Show me orders with a specific product number that has a Quantity greater than zero.
SELECT DISTINCT OrderNumber
FROM Order_Details
WHERE ProductNumber IN (1, 2, 6, 11)
INTERSECT
SELECT DISTINCT OrderNumber
FROM Supplier_Details
WHERE Quantity > 0;
Remove from the first set all the matching members you find in the second set, and the result is the difference.
Use the EXCEPT keyword to do set difference and find the not common values
E.g. Show me the orders that contain a bike but not a helmet.
SELECT DISTINCT OrderNumber
FROM Order_Details
WHERE ProductNumber IN (1, 2, 6, 11)
AND ProductNumber NOT IN (10, 25, 26)
#### Union
A __UNION__ lets you select the _rows_ from two or more similar result sets and combine them into a single
result set. Notice that we said _rows_ and not columns.
If you run a __UNION__, you combine two sets of similar information into one set. Duplicates are lost.
If you want to keep duplicates, use a __UNION ALL__.
In order to run a UNION, you need to make sure that:
* Each of the two SELECT statements that you are linking with a UNION must have the same number of output columns
* Each corresponding column must be 'comparable'.
UNION and UNION ALL combines the result of two or more SELECT statements (with each statement having the same number of columns, data types, and order).
* UNION selects distinct values only.
* UNION ALL allows duplicates.
```SQL
SELECT column_name(s) FROM table1 UNION SELECT column_name(s) FROM table2;
SELECT column_name(s) FROM table1 UNION ALL SELECT column_name(s) FROM table2;
Use JOIN to link multiple tables together. A Join is like an INTERSECT, except instead of matching on all the columns, you specify the columns you want to join on.
You specify a JOIN as part of the FROM clause in a SQL statement. A JOIN defines a ‘logical table’ that is the result of linking two tables or results sets.
Since we are joining multiple tables that might have the same column name, we need to provide a column reference that includes the table name. For example, here is a single table select that includes the table name:
SELECT Employees.FirstName, Employees.LastName, Employees.PhoneNumber
FROM Employees
To solve the JOIN, the database system combines every row in the first table with every row in the second table; this combination of all rows from one table with every row in the second talbe is called a Cartesian product.
You can create a correlation name to any table you list in your FROM clause.
Follow the table name with the optional keyword AS
and then the correlation name you want to assign.
This can be confusing because you usually write the SELECT clause before you write the FROM clause. If you plan to give a table an alias in the FROM clause, you have to use that alias when you qualify columnn names in the SELECT clause.
Example:
SELECT R.RecipeTitle, R.Preparation, RC.RecipeClassDescription
FROM Recipe_Classes AS RC
INNER JOIN Recipes AS R
ON RC.RecipeClassID = R.RecipeClassID
INNER JOIN is the default join. It is the same as saying JOIN, but say INNER to be explicit. Examples:
Displaying bowling teams and the name of each team captain Return only students that have registered for a class and classes for which a student has registered
Note: Remember that when you join, provide a column reference that includes the table name. E.g.
SELECT Employees.FirstName, Employees.LastName, Employees.PhoneNumber
FROM Employees
An embedded SELECT statement is a derived table, meaning you can substitute an entire SELECT statement for any table name in your FROM clause. What we’re doing is deriving a subset of data from one or more tables. You need to assign a correlation name so that the result of evaluating your embedded query has a name.
SELECT R.RecipeTitle, R.Preparation, RCFiltered.ClassName
FROM
(SELECT RecipeClassID, RecipeClassDescription AS ClassName
FROM Recipe_Classes AS RC
WHERE RC.ClassName = 'Main course' OR
RC.ClassName = 'Dessert') AS RCFiltered
INNER JOIN Recipes AS R
ON RCFiltered.RecipeClassID = R.RecipeClassID
If your database does not let you embed a SELECT statement inside a FROM clause, consider using a VIEW.
Examples:
List all the classes and the students who have registered for those classes
Use an OUTER JOIN with a test for Null values is an alternate way of discovering the difference between two sets. Examples: The rows with a Null value in the columns from the Classes table represent the difference between the set of all students and the set of students who have registered for a class
SELECT column_name(s) FROM table1 FULL OUTER JOIN table2 ON table1.column_name=table2.column_name;
The first table you name is the one on the ‘left’. The second table you name is the one on the ‘right’.
An outer join returns the unmatched rows from either the ‘left’ or ‘right’ table.
SELECT Recipe_Classes.RecipeClassDescription, Recipes.RecipeTitle
FROM Recipe_Classes
LEFT OUTER JOIN Recipes
ON Recipe_Classes.RecipeClassID = Recipes.RecipeClassID
List the recipe classes that do not yet have any recipes.
SELECT Recipe_Classes.RecipeClassDescription
FROM Recipe_Classes
LEFT OUTER JOIN Recipes
ON Recipe_Classes.RecipeClassID =
Recipes.RecipeClassID
WHERE Recipes.RecipeID IS NULL
This results in doing a difference or EXCEPT operation using a JOIN. We get back ‘all the Recipe_Classes
except
the ones that already appear in the Recipes
table’. We created an EXCEPT!
The FULL OUTER JOIN includes all the rows from both of the tables. When no matching rows exist for rows on the ‘left’ side of the JOIN, you see Null values from the result set on the ‘right’.
A subquery is a SELECT expression that you embed inside one of the clauses of a SELECT statement to form your final query statement.
Why would you need this? You can build complex filters that do not rely on the tables in your FROM
clause.
Using a subquery in a WHERE
clause is the only way to get the correct number of rows in your answer when you
want rows from one table based on the filtered contents from other related tables.
There are three different types of subqueries:
You can use subqueries to generate an output column or to perform a complex comparision in a WHERE
clause.
Subqueries can often be replaced more effectively with a join, but are often used to fetch the results of a function
calculation (aggregate). E.g.
You can use a row subquery to build a row value constructor. What happens is that when you create a WHERE clause, you build a search condition that is typically some sort of comparison of one column from one of your tables with another column or a literal. However, you can build a search condition that compares multiple values as a logical row with another set of values as a logical row (two row value constructors).
Example
SELECT SKUClass, SKUNumber, ProductName
FROM Products
WHERE
(SKUClass, SKUNumber)
>= ('DSK', 9775)
This is equivalent to:
SELECT SKUClass, SKUNumber, ProductName
FROM Products
WHERE
(SKUClass > 'DSK')
OR ((SKUClass = 'DSK')
AND (SKUNumber >= 9775))
A scalar subquery lets you fetch a single column or calculated expression from another table that does not have to be in the FROM clause of the main query. You can use this single value fetched by a scalar subquery in the list of columns you request in a SELECT clause or you can use it as a comparison value in a WHERE clause.
So what is happening? You are substituting the subquery where you would normally enter a single column name or expression that results in a single column. That is the reason why a scalar subquery needs to return exactly one column and no more than one row.
This lets you pluck a single value from some other table or query to include in the output of your query.
Example 1:
Subquery is in the SELECT
SELECT Orders.OrderNumber, Orders.OrderDate, Orders.ShipDate,
(SELECT Customers.CustLastName FROM Customers
WHERE Customers.CustomerID = Orders.CustomerID)
FROM Orders
WHERE Orders.ShipDate = '2017-10-03'
Example 2:
Subquery is the WHERE
clause
SELECT name, listed_price
FROM paintings
WHERE listed_price > (
SELECT AVG(listed_price)
FROM paintings
);
There are special predicate keywords for use in a WHERE
clause with a subquery, including:
IN
- use the IN
keyword in a WHERE
clause to compare a column or expression to a list of values.
Each value in the IN
list could be a scalar subquery.ALL
SOME
ANY
EXISTS
Example 1 (subquery returning a list):
The inner query returns a list of all manager IDs. The outer query filters only those sales agents who are not in the managers list and calculates an average agency fee. The query returns a single value.
SELECT AVG(agency_fee)
FROM sales_agents
WHERE id NOT IN (SELECT id FROM managers);
Example 2 (subquery returning a table):
When a subquery returns a table with multiple rows and multiple columns, that subquery is usually found in the
FROM
or JOIN
clause. This allows you to get a table with data that was not readily available in the database
(e.g. grouped data) and then join this table with another one from your database.
Say we wanted to see the total amount of sales for each artist who sold at least one painting in our gallery. Our inner query calculates the sale for each artist from the sales table. Our outer query combines this information with the artists’ first names and last names to get the info we want.
SELECT
artists.first_name,
artists.last_name,
artists_sales.sales
FROM artists
JOIN (
SELECT artist_id, SUM(sales_price) AS sales
FROM sales
GROUP BY artist_id
) as artist_sales
ON artists.id = artist_sales.artist_id;
first_name, last_name, sales
William, Liu, 344
Kate, Wein, 3454
A row subquery returns a single row.
These are often used in comparison/correlations (aka correlated subquery, synchronized subquery).
These are often slow because the inner subquery might be evaluated once for each row processed by the outer query.
Correlated subqueries are commonly used in the SELECT
, WHERE
, and FROM
statements.
Example 1 (correlated subquery that returns a scalar value)
We want to calculate the number of paintings found in each of our galleries.
The inner query returns a scalar value with the total number of paintings from the corresponding gallery.
Note that the inner query depends on the outer query. We pull in the gallery_id from the galleries
table (located
on the outer query). You cannot run the inner query by itself.
SELECT city,
(SELECT COUNT(*)
FROM paintings p
WHERE g.id = p.gallery_id) total_paintings
FROM galleries g;
# city, total paintings
# London, 2
# New York, 2
# Munich, 1
The equivalent of a join (note: joins are usually faster than subqueries) would be:
SELECT g.city, count(p.name) AS total_paintings
FROM galleries g
JOIN paintings p
ON g.id = p.gallery_id
GROUP BY g.city;
Example 2 (correlated subquery in the WHERE statement)
Say we want to get information about sales agents whose agency fees are higher than the average fee for their gallery.
The inner query returns the average agency fee for the gallery of the sales agent.
The outer query returns the information about only those sale agents who satisfy the condition included in the WHERE
statement
(i.e. an agency fee greater than their gallery average).
Notice that the subquery is a correlated subquery because it can’t be run independently of the outer query.
SELECT
last_name,
first_name,
agency_fee
FROM sales_agents sa1
WHERE sa1.agency_fee > (SELECT AVG(agency_fee)
FROM sales_agents sa2
WHERE sa2.gallery_id = sa1.gallery_id);
IN
Select the recipe title from the recipes where the recipe ID is in the recipe ingredients from the inner joined with the ingredient classes that have a recipe class of ‘Seafood’.
SELECT RecipeTitle
FROM Recipes
WHERE Recipes.RecipeID IN
(SELECT RecipeID
FROM Recipe_Ingredients
WHERE Recipie_Ingredients.IngredientID IN
(SELECT IngredientID
FROM Ingredients
INNER JOIN Ingredient_Classes
ON Ingredients.IngredientClassID = Ingredient_Classes.IngredientClassID
WHERE Ingredient_classes.IngredientClassDescription = 'Seafood'
)
)
Notice how we do not want to use a complex JOIN in the outer query since we might get duplicates (more than one row per recipe
that has more than one seafood ingredient). You can use a DISTINCT
, but your database will have to do more work then.
DISTINCT
to remove
ALL
, SOME
, ANY
)You have to SELECT as a table subquery that returns exactly one column and zero or more rows. If the subquery returns more than one row, the values in the row make up a list.
ALL
comparison must be true for all the values returned by the subquerySOME
or ANY
then the comparison only needs to be true for only one value in the listIf the subquery returns no rows, then any comparison predicate with the ALL
keyword is true, any with the SOME
or ANY
keyword is false.
SELECT Recipes.RecipeTitle
FROM Recipes
WHERE Recipes.RecipeID IN
(SELECT Recipe_Ingredients.RecipeID
FROM Recipe_Ingredients
WHERE Recipe_Ingredients.IngredientID = ANY
(SELECT Ingredients.IngredientID
FROM Ingredients
WHERE Ingredients.IngredientName
IN ('Beef', 'Garlic')
)
)
EXISTS
Use EXISTS
to check if a related row exists in the result set returned by a subquery.
E.g. Find all the customers who ordered a bicycle
SELECT Customers.CustomerID, Customers.CustFirstName, Customers.CustLastName
FROM Customers
WHERE EXISTS
(SELECT *
FROM (Orders
INNER JOIN Order_Details
ON Orders.OrderNumber = Order_Details.OrderNumber
)
INNER JOIN Products
ON Products.ProductNumber = Order_Details.ProductNumber
WHERE Products.CategoryID = 2
AND Orders.CustomerID = Customers.CustomerID)
HAVING
Use a HAVING
clause to filter out groups of information from a subquery.
Aggregates have a simple syntax.
COUNT() <DISTINCT> expression AS alias
SUM
AVG
MAX
MIN
Each aggregate calculates a single value from the rows in a result set.
Except COUNT(*)
, all aggregate functions automatically disregard Null values.
Standard SQL defines many functions that calculate values in a query. A subclass of functions are aggregate functions, which lets you calculate a single value for a group of rows in a result set. You can use an aggregate function to count the rows, find the largest or smallest value within a set of rows, or calculcate the average or total of some value or expression across a result set.
Use COUNT to determine the number of rows or the number of non-Null values in a result set.
Use COUNT(*)
to find out how many rows are in the entire set. Note: this is the only aggregate function to count Nulls.
Use COUNT(my_column_name)
to count the number of rows with non-Null values in that column.
Use COUNT DISTINCT
to count only the unique values.
e.g. Select the number of students
SELECT COUNT(*) AS CountOfStudents
FROM Students
e.g. Count the number of non-Null county values
SELECT COUNT(CustCounty) AS NumberOfKnownCounties
FROM Customers
e.g. How many unique county names are there in the customers table
SELECT COUNT(DISTINCT CustCounty) AS NumberOfUniqueCounties
FROM Customers
E.g. “List the name of every bowler whose personal average score is greater than the overall average score”
SELECT Bowlers.BowlerLastName, Bowlers.BowlerFistName
FROM Bowlers
WHERE (SELECT AVG(RawScore)
FROM Bowler_Scores AS BS
WHERE BS.BowlerID = Bowlers.BowlerID)
> (SELECT AVG(RawScore) FROM Bowler_Scores)
ORDER BY Bowlers.BowlerLastName, Bowlers.BowlerFirstName
Grouping data is usually done with aggregates
E.g. “Show me each entertainment group’s name, count of contracts for the group, total price of contracts”
SELECT Entertainers.EntStageName,
COUNT(*) AS NumContracts,
SUM(Engagements.ContractPrice) AS TotalPrice
FROM Entertainers
INNER JOIN Engagements
ON Entertainers.EntertainerID = Engagements.EntertainerID
GROUP BY Entertainers.EntStageName
GROUP BY
Get a subtotal across groups using aggregate functions and GROUP BY
.
HAVING
Use HAVING
after a GROUP BY
to filter out additional data.
Note that you can filter on WHERE
or HAVING
.
WHERE
to filter rows before your database groups them. Eliminate here first so you don’t do extra workWHERE
where you exclude Nulls / zero results, that’s when you want to group and use HAVING
UPDATE