William Liu

MySQL


Summary

SQL (Structured Query Language) is the standard language for communicating with relational database management systems. There’s a few variations including Microsoft’s SQL Server, MySQL, PostgreSQL, SQLite. Below instructions are with MySQL. We’ll setup a MySQL server, then use a client to connect to the server and query for data.

Installation

Mac Install

On a mac, use homebrew to install mysql using $brew install mysql to install the MySQL as a server. For a GUI client, you can download MySQL Workbench, which will allow you to visually see the data with a GUI (as opposed to only command line).

Linux (Ubuntu) Install

For Ubuntu, use your package manager to install sudo apt-get install mysql-server and your workbench sudo apt-get install mysql-workbench-community

Then setup with sudo /usr/bin/mysql_secure_installation, where you can setup root password and other configurations.

MySQL Config

You can find the mysql config under /etc/my.cnf. It may be in another location, but it’ll show you useful info.

Terminal Client

Instead of the regular ‘mysql’ terminal client, I actually like mycli: https://github.com/dbcli/mycli/

For the MySQL Prompt in mycli, I changed it to:

# MySQL prompt
# \t - Product type (Percona, MySQL, Mariadb)
# \u - Username
# \h - Hostname of the server
# \d - Database name
# \n - Newline
#prompt = '\t \u@\h:\d> '
prompt ="(\u@\h) [\d]> "

Commands

Useful MySQL commands (Start, Stop Server)

$systemctl status mysql  # check status of server
$systemctl start mysql  # start the server
$systemctl stop mysql   # stop the server
$systemctl help mysql  # get help commands

Run a text file (containing sql scripts) from command line

$mysql < myscript.sql

Switching to mysql shell

$mysql  # enters into the shell from bash
$mysql -u root -p  # enters into the shell as root user and prompts for
password
mysql>  # This is what you'll see when in shell

MySQL Commands

Create, Drop, and Use a Database

mysql> DROP DATABASE IF EXISTS my_database_name;
mysql> CREATE DATABASE my_database_name;
mysql> USE my_database_name;

Show list of users

See what Users you have available:

SELECT User FROM mysql.user;

To show the list of users, we want to look at the mysql database.

mysql> use mysql;
mysql> show tables;

We’re interested in the user table on the mysql database.

mysql> describe mysql.user;

For older versions of MySQL (below 5.7), you might want to look at these fields: mysql> select host, user, password from mysql.user;

For more recent version of MySQL (5.7+), you might want to look at these fields: mysql> select host, user, authentication_string from mysql.user;

Create user

To create a User, you can run:

CREATE USER 'custom'@'localhost' IDENTIFIED BY 'password';

You can then log in, say with: mycli --user will --password mypassword

Update user password

To update the user password:

mysql> update user set authentication_string=password('1111') where user = 'root';

Grant privileges

To grant all privileges to a user on host ‘localhost’ with a database called ‘db_test’ do:

GRANT ALL ON db_test.* to 'tester'@'localhost';

To update privileges

FLUSH PRIVILEGES;

Status of current database, user

mysql> status   # Shows current database, current user
--------------
mysql  Ver 14.14 Distrib 5.7.18, for Linux (x86_64) using  EditLine wrapper

Connection id:		10
Current database:
Current user:		root@localhost
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server version:		5.7.18-0ubuntu0.16.04.1 (Ubuntu)
Protocol version:	10
Connection:		Localhost via UNIX socket
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	utf8
Conn.  characterset:	utf8
UNIX socket:		/var/run/mysqld/mysqld.sock
Uptime:			13 min 24 sec

Threads: 1  Questions: 10  Slow queries: 0  Opens: 115  Flush tables: 1  Open tables: 34  Queries per second avg: 0.012

Show databases

List all the available databases

mysql> show databases;
+--------------------+
| Database           |
+--------------------+
| information_schema |
| mysql              |
| performance_schema |
| sys                |
| willdb             |
+--------------------+

Show tables

List all the available tables in a database

mysql> use willdb;
mysql> show tables;

mysql> show tables;
+------------------+
| Tables_in_willdb |
+------------------+
| iris             |
+------------------+

describe tables

See the table structure

mysql> describe iris;
+-------------+-------------+------+-----+---------+-------+
| Field       | Type        | Null | Key | Default | Extra |
+-------------+-------------+------+-----+---------+-------+
| sepallength | float       | YES  |     | NULL    |       |
| sepalwidth  | float       | YES  |     | NULL    |       |
| petallength | float       | YES  |     | NULL    |       |
| petalwidth  | float       | YES  |     | NULL    |       |
| class       | varchar(30) | YES  |     | NULL    |       |
+-------------+-------------+------+-----+---------+-------+
5 rows in set (0.00 sec)

see indexes on a table

Indexes in general help speed up database reads (e.g. search) and slows down database writes. If you index a column, you create ordering (e.g. say alphabetically) through an index. Instead of searching through all n rows, you might have a binary search w/ log-n index entries.

List all the indexes on a table

mysql> show index from iris;

See below on topics of ‘lock contention’, ‘deadlock’ caused by indexes.

You get some of the following fields:

show triggers

Show database triggers:

mysql> show triggers like '%something%';

Explain

explain is often used with complex queries, optimizing queries, and indexes. Put explain in front of a query that has ‘select’, ‘update’, ‘delete’, ‘insert’, ‘replace’, ‘update’

mysql> explain select * from my_db.my_table;
+------+---------------+-----------------+--------+-----------------+--------+-----------+--------+--------+---------+
| id   | select_type   | table           | type   | possible_keys   | key    | key_len   | ref    | rows   | Extra   |
|------+---------------+-----------------+--------+-----------------+--------+-----------+--------+--------+---------|
| 1    | SIMPLE        | my_table        | ALL    | <null>          | <null> | <null>    | <null> | 343848 |         |
+------+---------------+-----------------+--------+-----------------+--------+-----------+--------+--------+---------+

Here’s an example that uses an index:

explain select property_id, mls_id from my_db.my_table;
+------+---------------+-----------------+--------+-----------------+---------+-----------+--------+--------+-------------+
| id   | select_type   | table           | type   | possible_keys   | key     | key_len   | ref    | rows   | Extra       |
|------+---------------+-----------------+--------+-----------------+---------+-----------+--------+--------+-------------|
| 1    | SIMPLE        | my_table        | index  | <null>          | ix_cpid | 5         | <null> | 343848 | Using index |
+------+---------------+-----------------+--------+-----------------+---------+-----------+--------+--------+-------------+

Here’s an example with a join:

explain select count(*), t1.loc_city from idx.mlslv4_master t1 join
	    enterprise.community t2 where t1.loc_city = t2.name and t2.owner_id=44 group by t1.loc_city;
+------+---------------+---------+--------+-----------------+-------------+-----------+--------------------+--------+---------------------------------+
| id   | select_type   | table   | type   | possible_keys   | key| key_len   | ref    | rows               | Extra  |                                 |
|------+---------------+---------+--------+-----------------+-------------+-----------+--------------------+--------+---------------------------------|
| 1    | SIMPLE        | t2      | ref    | company_id      | company_id 	 | 4      | const              | 179    | Using temporary; Using filesort |
| 1    | SIMPLE        | t1      | ref    | ix_loc_city     | ix_loc_city 	 | 33	  | enterprise.t2.name | 10     | Using where; Using index        |
+------+---------------+---------+--------+-----------------+-------------+-----------+--------------------+--------+---------------------------------+

Show Processes

You can see currently running processes a few different ways:

show full processlist

SELECT * FROM INFORMATION_SCHEMA.PROCESSLIST

Loading Data

Say we want to load the iris dataset. We do a curl to get the dataset, then run the below sql query.

curl http://mlr.cs.umass.edu/ml/machine-learning-databases/iris/iris.data > iris.csv

|sepallength|sepalwidth|petallength|petalwidth|class      |
|-----------|----------|-----------|----------|-----------|
|5.1        |3.5       |1.4        |0.2       |Iris-setosa|
|4.9        |3         |1.4        |0.2       |Iris-setosa|
|4.7        |3.2       |1.3        |0.2       |Iris-setosa|

LOAD DATA LOCAL

#EDIT THIS PATH:
INFILE "/Users/williamliu/Desktop/iris.csv"

INTO TABLE sampdb.iris  #db sampdb, table iris

FIELDS
    TERMINATED BY ','
    #ENCLOSED BY '"'

LINES
   TERMINATED BY '\n'
#IGNORE 1 LINES;

Create Table

create table iris(
sepallength float,
sepalwidth float,
petallength float,
petalwidth float,
class nvarchar(30)
);

Datetimes

In MySQL 5.5, you could only have one TIMESTAMP column in a table that has the current timestamp as a the default value for initializing the column.

From MySQL 5.6.5 and on, the above restriction is lifted so any timestamp column definition can have any combination of default current_timestamp and on update current_timestamp clauses. E.g.

alter table my_table add column created_datetime timestamp default now(),
					 add column modified_datetime timestamp default '0000-00-00 00:00:00' ON UPDATE now();

This looks like:

show columns from my_table like '%datetime%';
+-------------------+-----------+--------+-------+---------------------+-----------------------------+
| Field             | Type      | Null   | Key   | Default             | Extra                       |
|-------------------+-----------+--------+-------+---------------------+-----------------------------|
| created_datetime  | timestamp | NO     |       | CURRENT_TIMESTAMP   |                             |
| modified_datetime | timestamp | NO     |       | 0000-00-00 00:00:00 | on update CURRENT_TIMESTAMP |
+-------------------+-----------+--------+-------+---------------------+-----------------------------+
2 rows in set

Queries

Basic Format of SQL Queries

Assuming data (say mytable from mydatabase) looks like:

# last_name, first_name, suffix, city, state, birth, death
Adams, John, , Braintree, MA, 1735-10-30, 1826-07-04
Adams, John Quincy, , Braintree, MA, 1767-07-11, 1848-02-23
Arthur, Chester A., , Fairfield, VT, 1829-10-05, 1886-11-18
Buchanan, James, , Mercersburg, PA, 1791-04-23, 1868-06-01
Bush, George H.W., , Milton, MA, 1924-06-12,

We can do the following query:

SELECT last_name, first_name, state
FROM mydatabase.mytable;

ROW NUMBER

To get a returning count of what the row number is, do this. Note that you can set variables and return them in the results.

SET @n=0;
SELECT @n := @n+1 AS 'row_num', sepallength FROM sampdb.iris;

COUNT

SELECT COUNT(*) FROM sampdb.president
SELECT COUNT(first_name), COUNT(death) FROM sampdb.president  # Returns 42 and 38
SELECT COUNT(DISTINCT state, first_name) FROM sampdb.president

Other Functions (AVG, SUM, MIN, MAX)

SELECT AVG(death) FROM sampdb.president
SELECT SUM(age) FROM sampdb.president
SELECT MIN(age) FROM sampdb.president
SELECT MAX(age) FROM sampdb.president

Functions (AS, DATEDIFF, CONCAT, etc.)

SELECT ... score * 2 AS doubled_score
SELECT ... DATEDIFF(death, birth)
SELECT ... CONCAT(first_name, ' ', last_name)
SELECT ... UNIQUE(last_name, middle_name, first_name)

NULL Values and Functions

WHERE

SELECT COUNT(first_name) FROM sampdb.president WHERE last_name="Adams"

GROUP BY

SELECT state, COUNT(*) FROM sampdb.president GROUP BY state ORDER BY state

GROUP BY using WHERE and HAVING

SELECT state, COUNT(*) FROM sampdb.president
   WHERE birth <= '1900-01-01'
   GROUP BY state
   HAVING COUNT(*) >=2
   ORDER BY COUNT(*);

Sample Output:

state, count(*)
   MA, 2
   VT, 2
   NC, 2
   NY, 4
   OH, 7
   VA, 8

CAST

You’ll often need to CAST, e.g. from a String to a Datetime

CAST('2009-01-01' AS DATETIME)

DATES and TIMES

Often you’ll need datetime functions. Check out the reference here:

https://dev.mysql.com/doc/refman/8.0/en/date-and-time-functions.html

CURDATE()

Get the Current Date

SELECT CURDATE();
+------------+
| CURDATE()  |
+------------+
| 2019-01-26 |
+------------+

INTERVAL

SELECT CURDATE() - INTERVAL 7 DAY;
+----------------------------+
| CURDATE() - INTERVAL 7 DAY |
+----------------------------+
| 2019-01-19                 |
+----------------------------+

DATE_SUB

mysql> SELECT DATE_SUB(NOW(), INTERVAL 30 DAY) AS 30DaysAgo;
+---------------------+
| 30DaysAgo           |
+---------------------+
| 2018-12-27 14:09:38 |
+---------------------+

STR_TO_DATE

You can select a string into a date or datetime.

SELECT whatever
  FROM yourtable
 WHERE STR_TO_DATE(yourdatefield, '%m/%d/%Y') > CURDATE() - INTERVAL 7 DAYS

CASCADE

Create ‘buildings’ table

CREATE TABLE buildings (
    building_no INT PRIMARY KEY AUTO_INCREMENT,
    building_name VARCHAR(255) NOT NULL,
    address VARCHAR(255) NOT NULL
);

CREATE TABLE rooms (
    room_no INT PRIMARY KEY AUTO_INCREMENT,
    room_name VARCHAR(255) NOT NULL,
    building_no INT NOT NULL,
    FOREIGN KEY (building_no)
        REFERENCES buildings (building_no)
        ON DELETE CASCADE
);

Notice that we add the ON DELETE CASCADE clause at the end of the foreign key constraint definition.

Subqueries

EXISTS

Exists Example

Suppose that a website contains two tables, the Customers table and the Orders table. Write a SQL query to find all customers who never order anything.

Table: Customers.

+----+-------+
| Id | Name  |
+----+-------+
| 1  | Joe   |
| 2  | Henry |
| 3  | Sam   |
| 4  | Max   |
+----+-------+

Table: Orders.

+----+------------+
| Id | CustomerId |
+----+------------+
| 1  | 3          |
| 2  | 1          |
+----+------------+
Using the above tables as example, return the following:

+-----------+
| Customers |
+-----------+
| Henry     |
| Max       |
+-----------+


SELECT c.Name AS Customers FROM Customers c
LEFT JOIN Orders o ON o.CustomerId = c.Id
WHERE NOT EXISTS (SELECT * FROM Orders o WHERE o.CustomerId = c.Id);

JOINS

Used to combine rows from two or more tables. Types of joins include:

Primary Key (PK) and Foreign Key (FK)

PRI, UNI, MUL

If more than one key value applies to a given column, the key displays the one with the highest priority in the order of PRI, UNI, and MUL.

There’s a column with Non_unique, meaning that 0 if the index cannot contain duplicates and 1 if it can.

DROP and TRUNCATE

ALTER

INSERT

INSERT ON DUPLICATE KEY UPDATE

You can specify the ON DUPLICATE KEY UPDATE clause when a row is about to be inserted, but that would then cause a duplicate value in a UNIQUE index or PRIMARY KEY. Instead, it’ll run an UPDATE of the old row.

An INSERT statement increases any auto-increment values, but UPDATE does not.

INSERT INTO t1 (a,b,c) VALUES (1,2,3)
  ON DUPLICATE KEY UPDATE c=c+1;

UPDATE

UPDATE MyDatabase.MyTable SET ThisField = ThisData

DELETE

DELETE FROM MyDatabase.MyTable  # Careful, no criteria deletes the entire database!

Example Delete Duplicates

Write a SQL query to delete all duplicate email entries in a table named Person, keeping only unique emails based on its smallest Id.

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
| 3  | john@example.com |
+----+------------------+

Id is the primary key column for this table. For example, after running your query, the above Person table should have the following rows:

+----+------------------+
| Id | Email            |
+----+------------------+
| 1  | john@example.com |
| 2  | bob@example.com  |
+----+------------------+

# Write your MySQL query statement below
DELETE FROM Person USING Person, Person AS PersonCopy
WHERE Person.Id > PersonCopy.Id AND Person.Email = PersonCopy.Email;

IF

IF(2>1, 'OK', 'NOT)

CASE

CASE
    WHEN 2 > 1
        THEN 'OK'
    WHEN 1==1
        THEN 'YEP'
    ELSE
        'NOT OK'
END

You can combine all of the above into something like this (Note: From SQL Server):

SELECT IVRName,
SUM(CASE WHEN Disconnection in ('Answer') THEN 1 END) AS Answered,
SUM(CASE WHEN Disconnection in ('Abandon') THEN 1 END) AS Abandoned
FROM LifeNetDW.dbo.QueueMetrics
WHERE CallTime > '2015-1-31' AND IVRName IS NOT NULL
GROUP BY IVRName

Indexing

An index is a copy of selected columns from a table that can be searched efficiently (with a link to the complete row of data it was copied from).

Advantages of Indexes

Indexes speed up the retrieval of data form the table (by using the index to select eact corresponding rows without scanning the whole table). So searches are faster. If you use primary key index and unique key indexes, it helps to avoid duplicate row data.

Disadvantaes of Indexes

Anytime you have a write query (e.g. INSERT, UPDATE, DELETE), the indexes must be updated so indexes slow down write queries.

B-Trees

A B-Tree is a self-balancing tree data structure that maintains sorted data and allows searches, sequential access, insertions, and deletions in logarithmic time. The B-Tree is a generalization of a binary search tree in that a node can have more than two children. B-Trees are often used in relational databases.

Cardinality

Cardinality is the uniqueness of data values in a particular column of a table. The lower the cardinality number, the more duplicated elements in a column. For example, a bit will have a cardinality of 2 (very low) while a date might have a cardinality of say 100000. If every value is unique, then you have maximum cardiniality. If you have the minimum cardinality, then all values are the same.

Cardinality is a property that affects the ability to cluster, sort and search data. Higher cardinality means:

Generally you want to list values in a composite index from highest cardiniality to lowest.

You can update the cardinality on a table with the following command:

ANALYZE TABLE my_table;
+-----------------------+---------+------------+------------+
| Table                 | Op      | Msg_type   | Msg_text   |
|-----------------------+---------+------------+------------|
| my_table              | analyze | status     | OK         |
+-----------------------+---------+------------+------------+

SHOW INDEX FROM my_table;

Binary Logging

Before we get into lock contention, let’s bring up how binary logging works. The binary log contains ‘events’ that describe database changes (e.g. table create, updates to table data). This log is used on the master server to send events to slave servers.

Data recovery operations also use the binary log. Statements like SELECT and SHOW aren’t included in the binary log. To enable the binary log, start the server with --log-bin[=base_name] where the default base_name is the name of the host machine followed by -bin.

To see if you have any binary logs, you can run:

mysql> show binary logs;

+---------------------+-------------+
| Log_name            | File_size   |
|---------------------+-------------|
| my-db-bin.000825 | 1073742262  |
| my-db-bin.000826 | 1073742059  |

Depending on your my.cnf file, you can see where your log-bins directory is.

Show Binary Log Events

mysql> show binlog events limit 10;

Here you can see what’s going on with the binlog events. Are there too many inserts, updates, etc.

Lock Contention

This is on the ‘user’ table with a PK on user_id

user_id(pk) | name   | status
1           | Ronald | 0
100         | Messi  | 0
10000000    | Raul   | 9

Read about binary logging first, then read below about lock contention. An example of lock contention is where:

Why did Query 2 get blocked by Query 1? Let’s think about the scenario if Query 1 does not block Query 2:

So basically, the table user needs to be locked because any following operations on that same table will cause data inconsistencies for binary logging (thus issues w/ say data replication to slaves).

To solve this we can use next-key locking, where we lock not just the ‘modified’ rows, but also ‘scanned’ rows. The disadvantage is low concurrency.

If column is indexed

If the status column is indexed, then Queries 1 and 2 can run in parallel (to improve concurrency).

View Deadlocks

To view deadlocked transactions, you can run:

SHOW ENGINE INNODB STATUS \G

An example might look like this:

*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 320171 page no 48814 n bits 248 index `PRIMARY` of table `my_db`.`my_table` trx id 126544661030 lock_mode X locks rec but not gap waiting
*** (2) TRANSACTION:
TRANSACTION 126544669705, ACTIVE 8 sec fetching rows, thread declared inside InnoDB 2630
mysql tables in use 2, locked 2
3964 lock struct(s), heap size 472616, 34781 row lock(s)
MySQL thread id 10893466142, OS thread handle 0x7ed5f70c3700, query id 829917141049 10.10.0.10 idx System lock
UPDATE my_table.my_db
        SET ready_to_delete_bit = 1
        WHERE my_field = 121 AND start_datetime > NOW()
        AND (another_field = 13 or this_field = 'a')
        ORDER BY that_field ASC
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 320171 page no 48814 n bits 248 index `PRIMARY` of table `my_db`.`my_table` trx id 126544669705 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 320171 page no 3035 n bits 200 index `PRIMARY` of table `my_db`.`my_table` trx id 126544669705 lock_mode X locks rec but not gap waiting
*** WE ROLL BACK TRANSACTION (2)
------------

Deadlocks

Indexes and keys play an important role in deadlocks, often caused by a technique known as a gap lock.

The idea is that InnoDB has two types of standard row-level locks:

InnoDB also supports multiple granularity locking, which allows the coexistence of row locks and table locks.

Table-level lock type compatibility is summarized in the following matrix.

	X			IX			S			IS
X	Conflict	Conflict	Conflict	Conflict
IX	Conflict	Compatible	Conflict	Compatible
S	Conflict	Conflict	Compatible	Compatible
IS	Conflict	Compatible	Compatible	Compatible

Deadlock (gap lock) caused by indexes

A gap lock is a lock on a gap between index records, or a lock on the gap before the first or after the last index record.

An example of this could be: SELECT c1 FROM t where c1 BETWEEN 10 and 20 FOR UPDATE; that will prevent other transactions from inserting a value of 15 into column t.c whether or not there was already such a value in the column because the gaps between all existing values in the range are locked.

A gap might span a single index value, multiple index values, or even be empty.

Gap locks are part of the tradeoff between performance and concurrency, and are used in some transaction isolation levels and not others.

See an example of MySQL deadlocks with concurrent inserts here:

http://thushw.blogspot.com/2010/11/mysql-deadlocks-with-concurrent-inserts.html

Show Innodb Status

Show InnoDB Status is also good for checking the number of row operations you have going on:

mysql>SHOW INNODB STATUS
Main thread process no. 38101, id 139815377716992, state: sleeping
Number of rows inserted 334154636, updated 547122709, deleted 315490449, read 6071813255519
33.11 inserts/s, 106.77 updates/s, 89.77 deletes/s, 950883.78 reads/s

Another way you can do this is by running the below SELECT, then waiting the time difference and doing your calculation (e.g. run, then wait 5 minutes, then do your calculation):

SELECT VARIABLE_NAME, VARIABLE_VALUE
FROM information_schema.global_status
WHERE VARIABLE_NAME IN ('COM_SELECT', 'COM_INSERT', 'COM_UPDATE', 'COM_DELETE');
+-----------------+------------------+
| VARIABLE_NAME   | VARIABLE_VALUE   |
|-----------------+------------------|
| COM_DELETE      | 153108           |
| COM_INSERT      | 5667103          |
| COM_SELECT      | 130384472        |
| COM_UPDATE      | 1978116          |
+-----------------+------------------+

Transactions

One way you can optimize is by instead of having every transaction run as a single query, you can group transactions together with Transactions.

Next-key Locks

A next-key lock is a combination of a record lock on the index record and a gap lock on the gap before the index record.

Example Table

user_id(pk) | name   | status
1           | Ronald | 0
2           | John   | 1
10000000    | Raul   | 9

Covering index and range scan / LIMIT

Covering index and long text/blob

Sorting, indexing and query execution plans

Other

Schemas

Normalization (Nth normal form)

Moving Files

If you need to move data across servers, you might have to do some extra work. If you’re on the database server, you have access to OUTFILE and INFILE, so you can dump data out or into say a csv file or a .sql file.

mysqldump

You can use mysqldump on the command line like this (login, get data from a database and table with a where clause)

mysqldump -t -u myusername -p mypassword -h myhostname --port 3306 mydatabasename mytablename --where="status='Sure'"

You can also run mysql on the command line like so (run a sql query, replace tabs with ‘,’, then dump to a csv file

mysql -u myusername -p mypassword -h myhostname mydatabasename mytablename -e "SELECT * FROM some_table" | tr '\t' , > myData.csv

Some of these commands assume you’re on the same server as the database server, which you probably won’t be. Use LOCAL INFILE instead if you want to upload a file to a remote server

mysql -u myusername -p mypassword -h myhostname --database mydatabasename -e "LOAD DATA LOCAL INFILE '/home/will/myData.csv' INTO TABLE sometable LINES TERMINATED BY '\n' IGNORE 1 LINES"

Sometimes it helps to dump some data from another server into the server you need, then do a quick join and update.

update some_database.some_table a
	join some_database.some_other_table b on a.id = b.id
set a.some_status=0

Check File Size of Tables

Say you have a database ‘idx’ and want to see how much space each table is, run:

select TABLE_NAME as `table`, round((DATA_LENGTH + INDEX_LENGTH) / 1024 / 1024) AS `Size (MB)`
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'idx'
ORDER BY (DATA_LENGTH + INDEX_LENGTH)
DESC;

Partitions

The idea behind partitioning is to use multiple tables instead of one table. You can divide a table into many tables so that you can have old data in one sub table and new data in another table. The database is then able to optimize queries where you ask for new data knowing that they are in the second table. You can also define how the data is partitioned.

CREATE TABLE employees (
	id INT NOT NULL,
	fname VARCHAR(30),
	lname VARCHAR(30),
	hired DATE NOT NULL DEFAULT '1970-01-01',
	separated DATE NOT NULL DEFAULT '9999-12-31',
	job_code INT,
	store_id INT
)
PARTITION BY RANGE ( YEAR(separated) ) (
	PARTITION p0 VALUES LESS THAN (1991),
	PARTITION p1 VALUES LESS THAN (1996),
	PARTITION p2 VALUES LESS THAN (2001),
	PARTITION p3 VALUES LESS THAN MAXVALUE
);

You can drop old data (pruning) with:

ALTER TABLE employees DROP PARTITION p0;

You can also speed up a query like this since all the data will be in partition p2:

SELECT COUNT(*)
FROM employees
WHERE separated BETWEEN '2000-01-01' AND '2000-12-31'
GROUP BY store_id;

Sys Schema

MySQL has a sys schema, which is a collection of views and procedures for MySQL admins to get insight on how a MySQL Database is used. https://github.com/mysql/mysql-sys

Install

Depending on your version (e.g. 5.6, 5.7), download the correct file (e.g. sys_56.sql) and run as root.

cd /tmp/mysql-sys/
mysql -u root -p < ./sys_56.sql

If you’re on your database and you don’t have Git, you can generate a single file using generate_sql_file.sh and you should then see it in your ‘gen’ directory.

mysql> use sys;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql> desc host_summary;
+------------------------+---------------+------+-----+---------+-------+
| Field                  | Type          | Null | Key | Default | Extra |
+------------------------+---------------+------+-----+---------+-------+
| host                   | varchar(60)   | YES  |     | NULL    |       |
| statements             | decimal(64,0) | YES  |     | NULL    |       |
| statement_latency      | text          | YES  |     | NULL    |       |
| statement_avg_latency  | text          | YES  |     | NULL    |       |
| table_scans            | decimal(65,0) | YES  |     | NULL    |       |
| file_ios               | decimal(64,0) | YES  |     | NULL    |       |
| file_io_latency        | text          | YES  |     | NULL    |       |
| current_connections    | decimal(41,0) | YES  |     | NULL    |       |
| total_connections      | decimal(41,0) | YES  |     | NULL    |       |
| unique_users           | bigint(21)    | NO   |     | 0       |       |
| current_memory         | text          | YES  |     | NULL    |       |
| total_memory_allocated | text          | YES  |     | NULL    |       |
+------------------------+---------------+------+-----+---------+-------+
12 rows in set (0.00 sec)

Logging

You can find logs in /var/log/mysql

There’s a break down of slow.log, error.log, etc.

Caching

MySQL has a global Query Cache that is shared among all the sessions. It caches the select query along with the result set. The query has to be EXACT (no whitespace difference) in order for the cache to be used correctly. Basically, if the query is in the cache, it’ll search there first.

Remember that there’s tradeoffs; faster reads but slower writes. Anytime an update happens, the cache is invalidated and it needs to reupdate the cache.

# check that query cache is enabled
show variables like '%have_query_cache%';
+------------------+---------+
| Variable_name    | Value   |
|------------------+---------|
| have_query_cache | YES     |
+------------------+---------+

show status like 'qcache%';
+-------------------------+---------+
| Variable_name           | Value   |
|-------------------------+---------|
| Qcache_free_blocks      | 0       |
| Qcache_free_memory      | 0       |
| Qcache_hits             | 0       |
| Qcache_inserts          | 0       |
| Qcache_lowmem_prunes    | 0       |
| Qcache_not_cached       | 0       |
| Qcache_queries_in_cache | 0       |
| Qcache_total_blocks     | 0       |
+-------------------------+---------+

show variables like '%query_cache_size%';
+------------------+---------+
| Variable_name    | Value   |
|------------------+---------|
| query_cache_size | 0       |
+------------------+---------+
1 row in set

DB Replication

You can see the binlog with:

SHOW BINARY LOGS LIMIT 100;

...
| mysql-bin-changelog.510021 |  49349349 |
+----------------------------+-----------+

When you replicate a database to another database, you can do a few quick checks to make sure things are running right:

mysql>SHOW MASTER STATUS;
mysql>SHOW SLAVE STATUS;

Look for the field Seconds_Behind_Master. If this value is NULL, then replication is broken, otherwise it’s an integer with how many seconds its behind.

If there is a Seconds_Behind_Master as NULL, then you’ll probably see `Last_Sql_Error’ with some error message. Fix the error (e.g. say disk is full) and then restart the slave server with:

mysql>START SLAVE;

Sometimes you’ll need to skip a query or two to make slave run again, so set this global variable to skip that query before starting slave:

SET GLOBAL SQL_SLAVE_SKIP_COUNTER=1;
START SLAVE;

Replication Threads

MySQL replication works with 3 threads. These 3 main threads exist for each source/replica connection.

A source that has multiple replicas creates one binary log dump thread for each currently connected replica, and each replica has its own replication I/O and SQL threads.

A replica uses two threads to separate reading updates from the source and executing them into independent tasks.

Binlog Dump Thread

The source creates a thread to send the binary log contents to a replica when the replica connects. This thread can be identified in the output of SHOW PROCESSLIST on the source as the Binlog Dump thread.

The binary log dump thread acquires a lock on the source’s binary log for reading each event that is to be sent to the replica. As soon as the event has been read, the lock is released, even before the event is sent to the replica.

Replication I/O Thread

When a START SLAVE statement is issued on a replica server, the replica creates an I/O thread, which connects to the source and asks it to send the updates recorded in its binary threads.

The replication I/O thread reads the updates that the source’s Binlog Dump thread sends and copies them to local files that comprise the replica’s relay log.

You can STOP SLAVE IO_THREAD and START SLAVE_IO_THREAD

Replication SQL Thread

The replica creates an SQL thread to read the relay log that is written by the replication I/O thread and executes the transactions contained in it. Usually this is the issue with lag.

You can STOP SLAVE SQL_THREAD and START SLAVE_SQL_THREAD

Replication Parallelization

You can set the slave_parallel_workers system variable to a value greater than 0 (the default). When this system variable is set, the replica creates the specified number of worker threads to apply transactions, plus a coordinator thread to manage them. With this setup, transactions that fail can be retried.

Troubleshooting Replication

You can tell if it’s the I/O thread or SQL thread by:

Statement-Based vs Row-Based Replication

Replication can take place on the SQL level (statement), on the row level, or a mixed (for binlog_format).

Replication Coordinates

SHOW MASTER STATUS\G;

*************************** 1. row **************************
                File: laforge-bin.011
            Position: 566920603
        Binlog_do_db:
    Binlog_ignore_db:

Binlog_do_db means to log updates that were done with only certain databases Binlog_ignore_db means to log updates by ignoring certain databases

The File and Position field shows the offset in the current binary log.

Setting up a Slave DB

    GRANT REPLICATION SLAVE ON *.* TO 'rpl_user@slave-host' IDENTIFIED BY 'rpl_pass';

    CHANGE MASTER TO MASTER_HOST='master-host', MASTER_USER='rpl_user',
    MASTER_PASSWORD='rpl_pass';

    START SLAVE;

    SHOW SLAVE STATUS\G;

Information_Schema

The information_schema allows you to get information about all tables and views within a database. The columns that this view returns are: