Summary
SQL (Structured Query Language) is the standard language for communicating with relational database management systems. There’s a few variations including Microsoft’s SQL Server, MySQL, PostgreSQL, SQLite. Below instructions are with MySQL. We’ll setup a MySQL server, then use a client to connect to the server and query for data.
On a mac, use homebrew to install mysql using $brew install mysql
to install the MySQL as a server. For a GUI client, you can download MySQL Workbench, which will allow you to visually see the data with a GUI (as opposed to only command line).
For Ubuntu, use your package manager to install sudo apt-get install
mysql-server
and your workbench sudo apt-get install
mysql-workbench-community
Then setup with sudo /usr/bin/mysql_secure_installation
, where you can setup
root password and other configurations.
You can find the mysql config under /etc/my.cnf
. It may be in another location, but it’ll show you useful info.
Instead of the regular ‘mysql’ terminal client, I actually like mycli
: https://github.com/dbcli/mycli/
For the MySQL Prompt in mycli, I changed it to:
# MySQL prompt
# \t - Product type (Percona, MySQL, Mariadb)
# \u - Username
# \h - Hostname of the server
# \d - Database name
# \n - Newline
#prompt = '\t \u@\h:\d> '
prompt ="(\u@\h) [\d]> "
sudo /usr/bin/mysql_secure_installation
mysql -uroot -psome_password
$systemctl status mysql # check status of server
$systemctl start mysql # start the server
$systemctl stop mysql # stop the server
$systemctl help mysql # get help commands
$mysql < myscript.sql
$mysql # enters into the shell from bash
$mysql -u root -p # enters into the shell as root user and prompts for
password
mysql> # This is what you'll see when in shell
mysql> DROP DATABASE IF EXISTS my_database_name;
mysql> CREATE DATABASE my_database_name;
mysql> USE my_database_name;
See what Users you have available:
SELECT User FROM mysql.user;
To show the list of users, we want to look at the mysql database.
mysql> use mysql;
mysql> show tables;
We’re interested in the user table on the mysql database.
mysql> describe mysql.user;
For older versions of MySQL (below 5.7), you might want to look at these fields: mysql> select host, user, password from mysql.user;
For more recent version of MySQL (5.7+), you might want to look at these fields: mysql> select host, user, authentication_string from mysql.user;
To create a User, you can run:
CREATE USER 'custom'@'localhost' IDENTIFIED BY 'password';
You can then log in, say with: mycli --user will --password mypassword
To update the user password:
mysql> update user set authentication_string=password('1111') where user = 'root';
To grant all privileges to a user on host ‘localhost’ with a database called ‘db_test’ do:
GRANT ALL ON db_test.* to 'tester'@'localhost';
To update privileges
FLUSH PRIVILEGES;
mysql> status # Shows current database, current user
--------------
mysql Ver 14.14 Distrib 5.7.18, for Linux (x86_64) using EditLine wrapper
Connection id: 10
Current database:
Current user: root@localhost
SSL: Not in use
Current pager: stdout
Using outfile: ''
Using delimiter: ;
Server version: 5.7.18-0ubuntu0.16.04.1 (Ubuntu)
Protocol version: 10
Connection: Localhost via UNIX socket
Server characterset: latin1
Db characterset: latin1
Client characterset: utf8
Conn. characterset: utf8
UNIX socket: /var/run/mysqld/mysqld.sock
Uptime: 13 min 24 sec
Threads: 1 Questions: 10 Slow queries: 0 Opens: 115 Flush tables: 1 Open tables: 34 Queries per second avg: 0.012
List all the available databases
mysql> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
| willdb |
+--------------------+
List all the available tables in a database
mysql> use willdb;
mysql> show tables;
mysql> show tables;
+------------------+
| Tables_in_willdb |
+------------------+
| iris |
+------------------+
See the table structure
mysql> describe iris;
+-------------+-------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| sepallength | float | YES | | NULL | |
| sepalwidth | float | YES | | NULL | |
| petallength | float | YES | | NULL | |
| petalwidth | float | YES | | NULL | |
| class | varchar(30) | YES | | NULL | |
+-------------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)
Indexes in general help speed up database reads (e.g. search) and slows down database writes. If you index a column, you create ordering (e.g. say alphabetically) through an index. Instead of searching through all n rows, you might have a binary search w/ log-n index entries.
List all the indexes on a table
mysql> show index from iris;
See below on topics of ‘lock contention’, ‘deadlock’ caused by indexes.
You get some of the following fields:
Show database triggers:
mysql> show triggers like '%something%';
explain
is often used with complex queries, optimizing queries, and indexes. Put explain in front of a query
that has ‘select’, ‘update’, ‘delete’, ‘insert’, ‘replace’, ‘update’
mysql> explain select * from my_db.my_table;
+------+---------------+-----------------+--------+-----------------+--------+-----------+--------+--------+---------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|------+---------------+-----------------+--------+-----------------+--------+-----------+--------+--------+---------|
| 1 | SIMPLE | my_table | ALL | <null> | <null> | <null> | <null> | 343848 | |
+------+---------------+-----------------+--------+-----------------+--------+-----------+--------+--------+---------+
Here’s an example that uses an index:
explain select property_id, mls_id from my_db.my_table;
+------+---------------+-----------------+--------+-----------------+---------+-----------+--------+--------+-------------+
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
|------+---------------+-----------------+--------+-----------------+---------+-----------+--------+--------+-------------|
| 1 | SIMPLE | my_table | index | <null> | ix_cpid | 5 | <null> | 343848 | Using index |
+------+---------------+-----------------+--------+-----------------+---------+-----------+--------+--------+-------------+
Here’s an example with a join:
explain select count(*), t1.loc_city from idx.mlslv4_master t1 join
enterprise.community t2 where t1.loc_city = t2.name and t2.owner_id=44 group by t1.loc_city;
+------+---------------+---------+--------+-----------------+-------------+-----------+--------------------+--------+---------------------------------+
| id | select_type | table | type | possible_keys | key| key_len | ref | rows | Extra | |
|------+---------------+---------+--------+-----------------+-------------+-----------+--------------------+--------+---------------------------------|
| 1 | SIMPLE | t2 | ref | company_id | company_id | 4 | const | 179 | Using temporary; Using filesort |
| 1 | SIMPLE | t1 | ref | ix_loc_city | ix_loc_city | 33 | enterprise.t2.name | 10 | Using where; Using index |
+------+---------------+---------+--------+-----------------+-------------+-----------+--------------------+--------+---------------------------------+
You can see currently running processes a few different ways:
show full processlist
SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST
Say we want to load the iris dataset. We do a curl to get the dataset, then run the below sql query.
curl http://mlr.cs.umass.edu/ml/machine-learning-databases/iris/iris.data > iris.csv
|sepallength|sepalwidth|petallength|petalwidth|class |
|-----------|----------|-----------|----------|-----------|
|5.1 |3.5 |1.4 |0.2 |Iris-setosa|
|4.9 |3 |1.4 |0.2 |Iris-setosa|
|4.7 |3.2 |1.3 |0.2 |Iris-setosa|
LOAD DATA LOCAL
#EDIT THIS PATH:
INFILE "/Users/williamliu/Desktop/iris.csv"
INTO TABLE sampdb.iris #db sampdb, table iris
FIELDS
TERMINATED BY ','
#ENCLOSED BY '"'
LINES
TERMINATED BY '\n'
#IGNORE 1 LINES;
create table iris(
sepallength float,
sepalwidth float,
petallength float,
petalwidth float,
class nvarchar(30)
);
In MySQL 5.5, you could only have one TIMESTAMP column in a table that has the current timestamp as a the default value for initializing the column.
From MySQL 5.6.5 and on, the above restriction is lifted so any timestamp column definition can have any combination
of default current_timestamp
and on update current_timestamp
clauses. E.g.
alter table my_table add column created_datetime timestamp default now(),
add column modified_datetime timestamp default '0000-00-00 00:00:00' ON UPDATE now();
This looks like:
show columns from my_table like '%datetime%';
+-------------------+-----------+--------+-------+---------------------+-----------------------------+
| Field | Type | Null | Key | Default | Extra |
|-------------------+-----------+--------+-------+---------------------+-----------------------------|
| created_datetime | timestamp | NO | | CURRENT_TIMESTAMP | |
| modified_datetime | timestamp | NO | | 0000-00-00 00:00:00 | on update CURRENT_TIMESTAMP |
+-------------------+-----------+--------+-------+---------------------+-----------------------------+
2 rows in set
Assuming data (say mytable from mydatabase) looks like:
# last_name, first_name, suffix, city, state, birth, death
Adams, John, , Braintree, MA, 1735-10-30, 1826-07-04
Adams, John Quincy, , Braintree, MA, 1767-07-11, 1848-02-23
Arthur, Chester A., , Fairfield, VT, 1829-10-05, 1886-11-18
Buchanan, James, , Mercersburg, PA, 1791-04-23, 1868-06-01
Bush, George H.W., , Milton, MA, 1924-06-12,
We can do the following query:
SELECT last_name, first_name, state
FROM mydatabase.mytable;
To get a returning count of what the row number is, do this. Note that you can set variables and return them in the results.
SET @n=0;
SELECT @n := @n+1 AS 'row_num', sepallength FROM sampdb.iris;
SELECT COUNT(*) FROM sampdb.president
SELECT COUNT(first_name), COUNT(death) FROM sampdb.president # Returns 42 and 38
SELECT COUNT(DISTINCT state, first_name) FROM sampdb.president
Distinct
, we get 42 instead of 43 total presidents, means one duplicate state and first_name comboSELECT AVG(death) FROM sampdb.president
SELECT SUM(age) FROM sampdb.president
SELECT MIN(age) FROM sampdb.president
SELECT MAX(age) FROM sampdb.president
SELECT ... score * 2 AS doubled_score
SELECT ... DATEDIFF(death, birth)
SELECT ... CONCAT(first_name, ' ', last_name)
SELECT ... UNIQUE(last_name, middle_name, first_name)
AS
(required for subqueries)CONCAT
combines strings into one, DATEDIFF
gets difference in days)UNIQUE
SELECT LastName, FirstName FROM Persons WHERE LastName IS NULL
SELECT LastName, FirstName FROM Persons WHERE LastName IS NOT NULL
SELECT UnitPrice*(UnitsInStock+IFNULL(UnitsOnOrder,0)) FROM Products
SELECT COUNT(first_name) FROM sampdb.president WHERE last_name="Adams"
GROUP BY
)IN
) or tuples
WHERE state IN ('MA', 'VA')
WHERE (last_name, state) IN ('Adams', 'MA')
SELECT state, COUNT(*) FROM sampdb.president GROUP BY state ORDER BY state
SELECT state, COUNT(*) FROM sampdb.president
WHERE birth <= '1900-01-01'
GROUP BY state
HAVING COUNT(*) >=2
ORDER BY COUNT(*);
Sample Output:
state, count(*)
MA, 2
VT, 2
NC, 2
NY, 4
OH, 7
VA, 8
You’ll often need to CAST, e.g. from a String to a Datetime
CAST('2009-01-01' AS DATETIME)
Often you’ll need datetime functions. Check out the reference here:
https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html
Get the Current Date
SELECT CURDATE();
+------------+
| CURDATE() |
+------------+
| 2019-01-26 |
+------------+
SELECT CURDATE() - INTERVAL 7 DAY;
+----------------------------+
| CURDATE() - INTERVAL 7 DAY |
+----------------------------+
| 2019-01-19 |
+----------------------------+
mysql> SELECT DATE_SUB(NOW(), INTERVAL 30 DAY) AS 30DaysAgo;
+---------------------+
| 30DaysAgo |
+---------------------+
| 2018-12-27 14:09:38 |
+---------------------+
You can select a string into a date or datetime.
SELECT whatever
FROM yourtable
WHERE STR_TO_DATE(yourdatefield, '%m/%d/%Y') > CURDATE() - INTERVAL 7 DAYS
Create ‘buildings’ table
CREATE TABLE buildings (
building_no INT PRIMARY KEY AUTO_INCREMENT,
building_name VARCHAR(255) NOT NULL,
address VARCHAR(255) NOT NULL
);
CREATE TABLE rooms (
room_no INT PRIMARY KEY AUTO_INCREMENT,
room_name VARCHAR(255) NOT NULL,
building_no INT NOT NULL,
FOREIGN KEY (building_no)
REFERENCES buildings (building_no)
ON DELETE CASCADE
);
Notice that we add the ON DELETE CASCADE
clause at the end of the foreign key constraint definition.
SELECT name FROM city WHERE pincode IN (SELECT pincode FROM pin WHERE zone = 'west')
SELECT name FROM city WHERE pincode IN ...
SELECT pincode FROM pin WHERE zone = 'west'
AS
) is required for subqueriesTwo varieties of subqueries (correlated and non-correlated)
Non-correlated subquery means that the inner query doesn’t depend on the outer query and can run as a stand alone query
SELECT company FROM stock WHERE listed_on_exchange = (SELECT ric FROM market WHERE country='japan')
- The inner query executes first, then the outer query.
- non-correlated subqueries usually use IN
or NOT IN
Correlated subquery means that the inner query depends on the outer query
SELECT student_id, score FROM sampdb.score AS scr WHERE event_id = 3 AND score > ( SELECT AVG(score) FROM sampdb.score WHERE event_id = scr.event_id GROUP BY event_id) )
- Requires use of alias (AS
)
- The outer query executes first, then the inner query; this is because the inner query depends on the output of the outer query
- These queries are slower than non-correlated queries (think about doing a join instead)
- correlated subqueries usually use EXISTS
or NOT EXISTS
SELECT * FROM mysuppliers WHERE NOT EXISTS (SELECT * FROM myorders WHERE mysuppliers.supplier_id = myorders.order_id);
Suppose that a website contains two tables, the Customers table and the Orders table. Write a SQL query to find all customers who never order anything.
Table: Customers.
+----+-------+
| Id | Name |
+----+-------+
| 1 | Joe |
| 2 | Henry |
| 3 | Sam |
| 4 | Max |
+----+-------+
Table: Orders.
+----+------------+
| Id | CustomerId |
+----+------------+
| 1 | 3 |
| 2 | 1 |
+----+------------+
Using the above tables as example, return the following:
+-----------+
| Customers |
+-----------+
| Henry |
| Max |
+-----------+
SELECT c.Name AS Customers FROM Customers c
LEFT JOIN Orders o ON o.CustomerId = c.Id
WHERE NOT EXISTS (SELECT * FROM Orders o WHERE o.CustomerId = c.Id);
Used to combine rows from two or more tables. Types of joins include:
SELECT column_name(s) FROM table1 a INNER JOIN table2 b ON a.column_name=b.column_name;
SELECT column_name(s) FROM table1 a LEFT JOIN table2 b ON a.column_name=b.column_name;
SELECT column_name(s) FROM table1 a RIGHT JOIN table2 b ON a.column_name=b.column_name;
SELECT column_name(s) FROM table1 a FULL OUTER JOIN table2 b ON a.column_name=b.column_name;
SELECT column_name(s) FROM table1 UNION SELECT column_name(s) FROM table2
SELECT column_name(s) FROM table1 UNION ALL SELECT column_name(s) FROM table2
a.my_field = b.my_field
)
SELECT column_name(s) FROM table1 a LEFT JOIN table2 b ON a.my_key = b.my_key WHERE b.my_key IS NULL
a.my_field = b.my_field
)
SELECT column_name(s) FROM table1 a RIGHT JOIN table2 b ON a.my_key = b.my_key WHERE a.my_key IS NULL
P_ID
, LastName
, SSN
).
P_ID
) as the constraint: ALTER TABLE MyTable ADD PRIMARY KEY (P_ID)
P_ID
, LastName) as the constraint: ALTER TABLE MyTable ADD PRIMARY KEY(P_ID, LastName)
ALTER TABLE MyTable DROP PRIMARY KEY
ALTER TABLE MyTable ADD FOREIGN KEY (P_ID) REFERENCES OtherTable(P_ID)
P_ID
) as the constraint: ALTER TABLE MyTable ADD CONSTRAINT my_fk FOREIGN KEY (P_ID) REFERENCES OtherTable(P_ID)
ALTER TABLE MyTable DROP FOREIGN KEY my_fk
NULL
field)If more than one key value applies to a given column, the key displays the one with the highest priority in the
order of PRI
, UNI
, and MUL
.
There’s a column with Non_unique
, meaning that 0 if the index cannot contain duplicates and 1 if it can.
DROP DATABASE MyDatabase
. This deletes everything in the database.DROP TABLE MyTable
TRUNCATE TABLE MyTable
ALTER TABLE MyTable MODIFY COLUMN MyColumn <datatype>
ALTER TABLE MyTable DROP COLUMN MyColumn
ALTER TABLE MyTable MODIFY COLUMN MyColumn <datatype>
INSERT INTO MyDatabase.MyTable SET ThisField = ThisData
INSERT INTO MyDatabase.MyTable VALUES ('field1', 'field2', 'field3'
You can specify the ON DUPLICATE KEY UPDATE
clause when a row is about to be inserted, but that would then cause a
duplicate value in a UNIQUE
index or PRIMARY KEY
. Instead, it’ll run an UPDATE
of the old row.
An INSERT statement increases any auto-increment values, but UPDATE does not.
INSERT INTO t1 (a,b,c) VALUES (1,2,3)
ON DUPLICATE KEY UPDATE c=c+1;
UPDATE MyDatabase.MyTable SET ThisField = ThisData
DELETE FROM MyDatabase.MyTable # Careful, no criteria deletes the entire database!
Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id.
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
| 3 | john@example.com |
+----+------------------+
Id is the primary key column for this table. For example, after running your query, the above Person table should have the following rows:
+----+------------------+
| Id | Email |
+----+------------------+
| 1 | john@example.com |
| 2 | bob@example.com |
+----+------------------+
# Write your MySQL query statement below
DELETE FROM Person USING Person, Person AS PersonCopy
WHERE Person.Id > PersonCopy.Id AND Person.Email = PersonCopy.Email;
IF(2>1, 'OK', 'NOT)
CASE
WHEN 2 > 1
THEN 'OK'
WHEN 1==1
THEN 'YEP'
ELSE
'NOT OK'
END
You can combine all of the above into something like this (Note: From SQL Server):
SELECT IVRName,
SUM(CASE WHEN Disconnection in ('Answer') THEN 1 END) AS Answered,
SUM(CASE WHEN Disconnection in ('Abandon') THEN 1 END) AS Abandoned
FROM LifeNetDW.dbo.QueueMetrics
WHERE CallTime > '2015-1-31' AND IVRName IS NOT NULL
GROUP BY IVRName
An index is a copy of selected columns from a table that can be searched efficiently (with a link to the complete row of data it was copied from).
CREATE INDEX MyIndex on MyTable (colToIndex)
CREATE INDEX MyIndex on MyTable(colToIndex1, colToIndex2, colToIndex3)
ALTER TABLE MyTable DROP INDEX MyIndex
Indexes speed up the retrieval of data form the table (by using the index to select eact corresponding rows without scanning the whole table). So searches are faster. If you use primary key index and unique key indexes, it helps to avoid duplicate row data.
Anytime you have a write query (e.g. INSERT, UPDATE, DELETE), the indexes must be updated so indexes slow down write queries.
A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-Tree is a generalization of a binary search tree in that a node can have more than two children. B-Trees are often used in relational databases.
Cardinality is the uniqueness of data values in a particular column of a table. The lower the cardinality number, the more duplicated elements in a column. For example, a bit will have a cardinality of 2 (very low) while a date might have a cardinality of say 100000. If every value is unique, then you have maximum cardiniality. If you have the minimum cardinality, then all values are the same.
Cardinality is a property that affects the ability to cluster, sort and search data. Higher cardinality means:
Generally you want to list values in a composite index from highest cardiniality to lowest.
You can update the cardinality on a table with the following command:
ANALYZE TABLE my_table;
+-----------------------+---------+------------+------------+
| Table | Op | Msg_type | Msg_text |
|-----------------------+---------+------------+------------|
| my_table | analyze | status | OK |
+-----------------------+---------+------------+------------+
SHOW INDEX FROM my_table;
Before we get into lock contention, let’s bring up how binary logging works. The binary log contains ‘events’ that describe database changes (e.g. table create, updates to table data). This log is used on the master server to send events to slave servers.
Data recovery operations also use the binary log. Statements like SELECT and SHOW aren’t included in the binary log.
To enable the binary log, start the server with --log-bin[=base_name]
where the default base_name
is the name
of the host machine followed by -bin
.
mysql> show binary logs;
+---------------------+-------------+
| Log_name | File_size |
|---------------------+-------------|
| my-db-bin.000825 | 1073742262 |
| my-db-bin.000826 | 1073742059 |
Depending on your my.cnf
file, you can see where your log-bins directory is.
mysql> show binlog events limit 10;
Here you can see what’s going on with the binlog events. Are there too many inserts, updates, etc.
This is on the ‘user’ table with a PK on user_id
user_id(pk) | name | status
1 | Ronald | 0
100 | Messi | 0
10000000 | Raul | 9
Read about binary logging first, then read below about lock contention. An example of lock contention is where:
DELETE FROM user WHERE status=9
and it takes a long time to scan and delete
UPDATE user SET status=9 where user_id=100
Why did Query 2 get blocked by Query 1? Let’s think about the scenario if Query 1 does not block Query 2:
user_id=100
UPDATE user SET status=9 WHERE user_id=100;
DELETE FROM user WHERE status=9;
user_id=100
will not exist, but will exist on masterSo basically, the table user
needs to be locked because any following operations on that same table will cause
data inconsistencies for binary logging (thus issues w/ say data replication to slaves).
To solve this we can use next-key locking, where we lock not just the ‘modified’ rows, but also ‘scanned’ rows. The disadvantage is low concurrency.
If the status column is indexed, then Queries 1 and 2 can run in parallel (to improve concurrency).
To view deadlocked transactions, you can run:
SHOW ENGINE INNODB STATUS \G
An example might look like this:
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 320171 page no 48814 n bits 248 index `PRIMARY` of table `my_db`.`my_table` trx id 126544661030 lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 126544669705, ACTIVE 8 sec fetching rows, thread declared inside InnoDB 2630
mysql tables in use 2, locked 2
3964 lock struct(s), heap size 472616, 34781 row lock(s)
MySQL thread id 10893466142, OS thread handle 0x7ed5f70c3700, query id 829917141049 10.10.0.10 idx System lock
UPDATE my_table.my_db
SET ready_to_delete_bit = 1
WHERE my_field = 121 AND start_datetime > NOW()
AND (another_field = 13 or this_field = 'a')
ORDER BY that_field ASC
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 320171 page no 48814 n bits 248 index `PRIMARY` of table `my_db`.`my_table` trx id 126544669705 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 320171 page no 3035 n bits 200 index `PRIMARY` of table `my_db`.`my_table` trx id 126544669705 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (2)
------------
Indexes and keys play an important role in deadlocks, often caused by a technique known as a gap lock.
The idea is that InnoDB has two types of standard row-level locks:
InnoDB also supports multiple granularity locking, which allows the coexistence of row locks and table locks.
Table-level lock type compatibility is summarized in the following matrix.
X IX S IS
X Conflict Conflict Conflict Conflict
IX Conflict Compatible Conflict Compatible
S Conflict Conflict Compatible Compatible
IS Conflict Compatible Compatible Compatible
A gap lock is a lock on a gap between index records, or a lock on the gap before the first or after the last index record.
An example of this could be: SELECT c1 FROM t where c1 BETWEEN 10 and 20 FOR UPDATE;
that will prevent other transactions
from inserting a value of 15
into column t.c
whether or not there was already such a value in the column because
the gaps between all existing values in the range are locked.
A gap might span a single index value, multiple index values, or even be empty.
Gap locks are part of the tradeoff between performance and concurrency, and are used in some transaction isolation levels and not others.
See an example of MySQL deadlocks with concurrent inserts here:
http://thushw.blogspot.com/2010/11/mysql-deadlocks-with-concurrent-inserts.html
Show InnoDB Status is also good for checking the number of row operations you have going on:
mysql>SHOW INNODB STATUS
Main thread process no. 38101, id 139815377716992, state: sleeping
Number of rows inserted 334154636, updated 547122709, deleted 315490449, read 6071813255519
33.11 inserts/s, 106.77 updates/s, 89.77 deletes/s, 950883.78 reads/s
Another way you can do this is by running the below SELECT, then waiting the time difference and doing your calculation (e.g. run, then wait 5 minutes, then do your calculation):
SELECT VARIABLE_NAME, VARIABLE_VALUE
FROM information_schema.global_status
WHERE VARIABLE_NAME IN ('COM_SELECT', 'COM_INSERT', 'COM_UPDATE', 'COM_DELETE');
+-----------------+------------------+
| VARIABLE_NAME | VARIABLE_VALUE |
|-----------------+------------------|
| COM_DELETE | 153108 |
| COM_INSERT | 5667103 |
| COM_SELECT | 130384472 |
| COM_UPDATE | 1978116 |
+-----------------+------------------+
One way you can optimize is by instead of having every transaction run as a single query, you can group transactions
together with Transactions
.
A next-key lock is a combination of a record lock on the index record and a gap lock on the gap before the index record.
user_id(pk) | name | status
1 | Ronald | 0
2 | John | 1
10000000 | Raul | 9
If you need to move data across servers, you might have to do some extra work. If you’re on the database server,
you have access to OUTFILE
and INFILE
, so you can dump data out or into say a csv file or a .sql file.
You can use mysqldump
on the command line like this (login, get data from a database and table with a where clause)
mysqldump -t -u myusername -p mypassword -h myhostname --port 3306 mydatabasename mytablename --where="status='Sure'"
You can also run mysql
on the command line like so (run a sql query, replace tabs with ‘,’, then dump to a csv file
mysql -u myusername -p mypassword -h myhostname mydatabasename mytablename -e "SELECT * FROM some_table" | tr '\t' , > myData.csv
Some of these commands assume you’re on the same server as the database server, which you probably won’t be. Use LOCAL INFILE
instead if you want to upload a file to a remote server
mysql -u myusername -p mypassword -h myhostname --database mydatabasename -e "LOAD DATA LOCAL INFILE '/home/will/myData.csv' INTO TABLE sometable LINES TERMINATED BY '\n' IGNORE 1 LINES"
Sometimes it helps to dump some data from another server into the server you need, then do a quick join and update.
update some_database.some_table a
join some_database.some_other_table b on a.id = b.id
set a.some_status=0
Say you have a database ‘idx’ and want to see how much space each table is, run:
select TABLE_NAME as `table`, round((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)`
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'idx'
ORDER BY (DATA_LENGTH + INDEX_LENGTH)
DESC;
The idea behind partitioning is to use multiple tables instead of one table. You can divide a table into many tables so that you can have old data in one sub table and new data in another table. The database is then able to optimize queries where you ask for new data knowing that they are in the second table. You can also define how the data is partitioned.
CREATE TABLE employees (
id INT NOT NULL,
fname VARCHAR(30),
lname VARCHAR(30),
hired DATE NOT NULL DEFAULT '1970-01-01',
separated DATE NOT NULL DEFAULT '9999-12-31',
job_code INT,
store_id INT
)
PARTITION BY RANGE ( YEAR(separated) ) (
PARTITION p0 VALUES LESS THAN (1991),
PARTITION p1 VALUES LESS THAN (1996),
PARTITION p2 VALUES LESS THAN (2001),
PARTITION p3 VALUES LESS THAN MAXVALUE
);
You can drop old data (pruning) with:
ALTER TABLE employees DROP PARTITION p0;
You can also speed up a query like this since all the data will be in partition p2
:
SELECT COUNT(*)
FROM employees
WHERE separated BETWEEN '2000-01-01' AND '2000-12-31'
GROUP BY store_id;
MySQL has a sys schema
, which is a collection of views and procedures for MySQL admins to get insight on
how a MySQL Database is used. https://github.com/mysql/mysql-sys
Depending on your version (e.g. 5.6, 5.7), download the correct file (e.g. sys_56.sql
) and run as root.
cd /tmp/mysql-sys/
mysql -u root -p < ./sys_56.sql
If you’re on your database and you don’t have Git, you can generate a single file using generate_sql_file.sh
and you
should then see it in your ‘gen’ directory.
mysql> use sys;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Database changed
mysql> desc host_summary;
+------------------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------------------+---------------+------+-----+---------+-------+
| host | varchar(60) | YES | | NULL | |
| statements | decimal(64,0) | YES | | NULL | |
| statement_latency | text | YES | | NULL | |
| statement_avg_latency | text | YES | | NULL | |
| table_scans | decimal(65,0) | YES | | NULL | |
| file_ios | decimal(64,0) | YES | | NULL | |
| file_io_latency | text | YES | | NULL | |
| current_connections | decimal(41,0) | YES | | NULL | |
| total_connections | decimal(41,0) | YES | | NULL | |
| unique_users | bigint(21) | NO | | 0 | |
| current_memory | text | YES | | NULL | |
| total_memory_allocated | text | YES | | NULL | |
+------------------------+---------------+------+-----+---------+-------+
12 rows in set (0.00 sec)
You can find logs in /var/log/mysql
There’s a break down of slow.log, error.log, etc.
MySQL has a global Query Cache that is shared among all the sessions. It caches the select query along with the result set. The query has to be EXACT (no whitespace difference) in order for the cache to be used correctly. Basically, if the query is in the cache, it’ll search there first.
Remember that there’s tradeoffs; faster reads but slower writes. Anytime an update happens, the cache is invalidated and it needs to reupdate the cache.
# check that query cache is enabled
show variables like '%have_query_cache%';
+------------------+---------+
| Variable_name | Value |
|------------------+---------|
| have_query_cache | YES |
+------------------+---------+
show status like 'qcache%';
+-------------------------+---------+
| Variable_name | Value |
|-------------------------+---------|
| Qcache_free_blocks | 0 |
| Qcache_free_memory | 0 |
| Qcache_hits | 0 |
| Qcache_inserts | 0 |
| Qcache_lowmem_prunes | 0 |
| Qcache_not_cached | 0 |
| Qcache_queries_in_cache | 0 |
| Qcache_total_blocks | 0 |
+-------------------------+---------+
show variables like '%query_cache_size%';
+------------------+---------+
| Variable_name | Value |
|------------------+---------|
| query_cache_size | 0 |
+------------------+---------+
1 row in set
You can see the binlog with:
SHOW BINARY LOGS LIMIT 100;
...
| mysql-bin-changelog.510021 | 49349349 |
+----------------------------+-----------+
When you replicate a database to another database, you can do a few quick checks to make sure things are running right:
mysql>SHOW MASTER STATUS;
mysql>SHOW SLAVE STATUS;
Look for the field Seconds_Behind_Master
. If this value is NULL, then replication is broken, otherwise it’s an
integer with how many seconds its behind.
If there is a Seconds_Behind_Master
as NULL
, then you’ll probably see `Last_Sql_Error’ with some error message.
Fix the error (e.g. say disk is full) and then restart the slave server with:
mysql>START SLAVE;
Sometimes you’ll need to skip a query or two to make slave run again, so set this global variable to skip that query before starting slave:
SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
START SLAVE;
MySQL replication works with 3 threads. These 3 main threads exist for each source/replica connection.
A source that has multiple replicas creates one binary log dump thread for each currently connected replica, and each replica has its own replication I/O and SQL threads.
A replica uses two threads to separate reading updates from the source and executing them into independent tasks.
The source creates a thread to send the binary log contents to a replica when the replica connects.
This thread can be identified in the output of SHOW PROCESSLIST
on the source as the Binlog Dump
thread.
The binary log dump thread acquires a lock on the source’s binary log for reading each event that is to be sent to the replica. As soon as the event has been read, the lock is released, even before the event is sent to the replica.
When a START SLAVE
statement is issued on a replica server, the replica creates an I/O thread, which connects to the
source and asks it to send the updates recorded in its binary threads.
The replication I/O thread reads the updates that the source’s Binlog Dump
thread sends and copies them to local files
that comprise the replica’s relay log.
You can STOP SLAVE IO_THREAD
and START SLAVE_IO_THREAD
The replica creates an SQL thread to read the relay log that is written by the replication I/O thread and executes the transactions contained in it. Usually this is the issue with lag.
You can STOP SLAVE SQL_THREAD
and START SLAVE_SQL_THREAD
You can set the slave_parallel_workers
system variable to a value greater than 0 (the default). When this system
variable is set, the replica creates the specified number of worker threads to apply transactions, plus a coordinator
thread to manage them. With this setup, transactions that fail can be retried.
You can tell if it’s the I/O thread or SQL thread by:
SHOW MASTER STATUS
has a file (e.g. mysql-bin-changelog.066552) and the replica SHOW SLAVE STATUS
has an earlier file (e.g. mysql-bin-changelog.066548), that means the IO_THREAD is behind.Replication can take place on the SQL level (statement), on the row level, or a mixed (for binlog_format
).
SHOW MASTER STATUS\G;
*************************** 1. row **************************
File: laforge-bin.011
Position: 566920603
Binlog_do_db:
Binlog_ignore_db:
Binlog_do_db
means to log updates that were done with only certain databases
Binlog_ignore_db
means to log updates by ignoring certain databases
The File
and Position
field shows the offset in the current binary log.
GRANT REPLICATION SLAVE ON *.* TO 'rpl_user@slave-host' IDENTIFIED BY 'rpl_pass';
CHANGE MASTER TO MASTER_HOST='master-host', MASTER_USER='rpl_user',
MASTER_PASSWORD='rpl_pass';
START SLAVE;
SHOW SLAVE STATUS\G;
The information_schema
allows you to get information about all tables and views within a database.
The columns that this view returns are: