NOTES:Schwartz, Baron et al (2008) High Performance MySQL (2nd Edition) O'Reilly. Sebastopol, CA
Using Code Examples
Examples are maintained on the site http://www.highperfmysql.com/ and will be updated from time to time.
Authors appreciate but do not require attribution. An attribution usually includes the title, author, publisher & ISBN. For example: "High Performance MySQL: Optimization, Backups, Replication, and More, Second Edition, by Baron Schwartz et al. Copyright 2008 O'Reilly Media, Inc., 9780596101718"
Acknowledgements for the Second Edition
A list of people who contributed directly: Tobias Asplund, Igor Babaev, Pascal Borghino, Roland Bouman, Ronald Bradforf, Mark Callaghan, Jeremy Cole, Britt Crawford and the HiveDB project, Vasil Dimov, Harrison Fisk, Florian Haas, Dmitri Joukovski and Zmanda (thanks for the diagram explaining the LVM snapshots), Alan Kasindorf, Sheeri Kritzer Cabral, Marko Makela, Giuseppe Maxia, Paul McCullagh, B. Keith Murphy, Dhiren Patel, Sergey Petrunia, Alexander Rubin, Paul Tuckfield, Heikki Tuuri and Michael "Monty" Widenius.
MySQL's architecture is very different from that of other database servers, and makes it useful for a wide range of purposes. MySQL is not perfect, but it is flexible enough to work well in very demanding environments, such as web applications. At the same time, MySQL can power embedded applications, data warehouses, content indexing and delivery software, highly available redundant systems, online transaction processing (OLTP), and much more.
MySQL can be configured to run well on a wide range of hardware, and it supports a variety of data types. However, MySQL's most unusual and important feature is its storage-engine architecture, whose desing separates query processing and other tasks from data storage and retrieval. In MySQL 5.1, you can even load storage engines as run-time plug-ins. This separationm of concerns lets you choose, on a per-table basis, how your data is stored and what performance, features adn other characteristics you want.
MySQL's Logical Architecture
A good mental picture of how MySQL's components work together will help you understand the server. The topmost layer contains the services that aren't unique to MySQL. They're services most network-based client/server tools or servers need: connection handling, authentication, security, and so forth.
Much of MySQL's brains are on the second tier, including the code for query parsing, analysis, optimization, caching, and all the built-in functions (e.g. dates, times, math and encryption). Any functionality provided across storage engines lives at this level: stored procedures, triggers, and views, for example.
The third layer contains the storage engines. They are responsible for storing and retrieving all data stored "in" MySQL. Like the various filesystems available for GNU/Linux, each storage engine has its benefits and drawbacks. The server communicates with them through the storage engine API. This interface hides differences between storage engines and makes them largely transparent at the query layer. The API contains a couple of dozen low level functions that perform operations such as "begin a transaction" or "fetch the row that has this primary key." The storage engines don't parse SQL or communicate with each other; they simply respond to requests from the server.
Each client gets its own thread within the server process. The connection's queries execute within that single thread, which in turn resides on one core CPU. The server caches threads, so they don't need to be created and destroyed for each new connection.
When clients (applications) connect to MySQL servere, the server needs to authenticate them. Authentication is based on username, originating host and password.
Optimization and Execution
MySQL parses queries to create an internal structure (the parse tree), and then applies a variety of optimizations. These may include rewriting the query, determining the order in which it will read tables, choosing which indexes to use and so on.
You can pass hints to the optimizer through special keywords in the query, affecting its decision making process. You can also ask the server to explain various aspects of the optimization. This lets you know what decisions the server is making and gives you a reference point for reworking queries, schemas, and setting to make everything run as efficiently as possible.
The optimizer does not really care what storage engine a particular table uses, but the storage engine does affect how the server optimizes the query. The optimizer asks the storage about some of its capabilities and the cost of certain operations, and for statistics on the table data.
Before even parsing the query, though, the server consults the query cache, which can store only SELECT statements, along with their result sets. If anyone issues a query that's identical to one already in the cache, the server doesn't need to parse, optimize or execute the query at all- it can simply pass back the stored result set.
Any time moe than one query needs to change data at the same time, the problem of concurrency control arises. For our purposes in this chapter, MySQL has to do this at two levels: the server level and the storage engine level. Concurrency control is a big topic to which a large body of theoretical literature is devoted, but this book isn't about theory or even about MySQL internals.
You can't examine the more advanced features of databases for very long before transactions enter the mix. A transaction is a group of SQL queries that are treated atomically, as a single unit of work. eg --
- START TRANSACTION;
- SELECT balance FROM checking WHERE customer_id = 10233276;
- UPDATE checking SET balance - 200.00 WHERE customer_id = 10233276;
- UPDATE savings SET balance = balance + 200.00 WHERE customer_id = 10233276;
Transactions aren't enough unless the system passes the ACID test. ACID stands for Atomicity, Consistency, Isolation, and Durability. These are tightly related criteria that a well-behaved transaction processing system must meet.
Atomicity - A transaction must function as a single indivisible unit of work so that the entire transaction is either applied or rolled back. When transactions are atomic, there is no such thing as a partially completed transaction: its all or nothing.
Consistency - The database should always move from one consistent state to the next. In our example, consistency ensures that a crash between lines 3 and 5 doesn't result in $200 disappearing from the checking account. Because the transaction is never committed, none of the transactions changs is ever reflected in the database.
Isolation - The result of a transaction are usually invisible to other transactions until the transaction is complete. This ensures that if a bank account summary runs after line 3 but before line 4 in our example, it will still see the $200 in the checking account. When we discuss isolation levels, you'll understand why we said usually invisible.
Durability - Once committed, a transaction's changes are permanent. This means the changes must be recorded such that data wont' be lost in a system crash. Durability is a slightly fuzzy concept, however, because there are actually many levels. Some durability strategies provide a stronger safety guarantee than others, and nothing is ever %100 durable. We discuss what durability really means in MySQL in later chapters, especially in "InnoDB I/O Tuning" on page 283.
ACID transactions ensure that banks don't lose your money. It is generally extremely difficult or impossible to do this with application logic. An ACID-compliant database server has to do all sorts of complicated things you might not realize to provide ACID guarantees.
A database server with ACID transactions also generally requires more CPU power, memory, and disk space than one without them. This is where MySQL's storage engine architecture works to your advantage. You can decide whether your application needs transactions. If you don't really need them, you might be able to get higher performance with a non-transactional storage engine for some kinds of queries. You might be able to use LOCK TABLES to give the level of protection you need without transactions. Its all up to you.
MySQL's Storage Engines
MySQL stores each database (also called a schema) as a subdirectory of its data directory in the underlying filesystem. When you create a table, MySQL stores the table definition in a .frm file with the same name as the table. Thus, wen you create a table named MyTable, MySQL stores the dable definition in MyTable.frm. Because MySQL uses the filesystem to store database names and table definitions, case sensitivity depends on the platfomr. On a Windows MySQL instance, table and database names are case insensitive; on Unix-like systems, they are case sensitive. Each storage engine stores the table's data and indexes differently, but the server itself handles the table definition.
To determine what storage engine a particular table uses, use the SHOW TABLE STATUS command. For example, to examine the user table in the mysql database, execute the following:
mysql> SHOW TABLE STATUS LIKE 'user' \G
Selecting the Right Engine
When designing MySQL based applications, you should decide which storage engine to use for storing your data. If you don't think about this during the design phase, you will likely face complications later in the process. You might find that the default engine doesn't provide a feature you need, such as transactions, or maybe the mix of read and write queries your application generates will require more granular locking than MyISAM's table locks.
Because you can choose storage engines on a table by table basis, you'll need a clear idea of how each table will be used and the data it will store. It helps to have a good understanding of the application as a whole and its potential for growth.
It is not necessarily a good idea to use different storage engines for different tables. If you can get away with it, it will usually make your life a lot easier if you choose one storage engine for all your tables.
Although many factors can affect your decision about which storage engine(s) to use, it usually boils down to a few primary considerations. Here are the main elements you should take into account:
Keeping multiple storage engines increases the complexity of backups and server tuning.
- Crash recovery
- Special featurse
- mysql> ALTER TABLE mytable ENGINE = Falcon;
CREATE and SELECT
- mysql> CREATE TABLE innodb_table LIKE myisam_table;
- mysql> ALTER TABLE innodb_table ENGINE=InnoDB;
- mysql> INSERT INTO innodb_table SELECT * FROM myisam_table;
- mysql> START TRANSACTION;
- mysql> INSERT INTO innodb_table SELECT * FROM myisam_table
- mysql> COMMIT;