MySQL Administrator's Bible- P10

Chia sẻ: Thanh Cong | Ngày: | Loại File: PDF | Số trang:50

Thêm vào BST

Báo xấu

117
lượt xem 34
download

Download Vui lòng tải xuống để xem tài liệu đầy đủ

MySQL Administrator's Bible- P10: Với tập trung đặc biệt vào việc phát hành lớn tiếp theo của MySQL, nguồn tài nguyên này cung cấp một khuôn khổ vững chắc cho bất cứ ai mới vào MySQL hoặc chuyển từ một nền tảng cơ sở dữ liệu, cũng như các quản trị MySQL kinh nghiệm. Các bộ đôi tác giả cao hồ sơ cung cấp bảo hiểm thiết yếu của các nguyên tắc cơ bản của phạm vi bảo hiểm cơ sở dữ liệu quản lý, bao gồm cả cách tiếp cận độc đáo MySQL của các tính năng cơ sở...

Chủ đề:

Bình luận(0) Đăng nhập để gửi bình luận!

Lưu

Nội dung Text: MySQL Administrator's Bible- P10

Storage Engines 11 Once this is done you can SELECT from the FEDERATED table and see the rows in the remote table. Limitations of FEDERATED tables Although FEDERATED tables can be extremely useful they do have some limitations. FED- ERATED tables are non-transactional in nature regardless of the support that the remote table might have for transactions. In addition, indexes are not always handled the same for the FEDERATED table as they are for the remote table. FEDERATED tables do not support index preﬁxes on CHAR, VARCHAR, TEXT, or BLOB columns. There are instances where indexes are not used as expected. Though it is a fairly minor issue, FEDERATED tables do not support the use of ALTER TABLE. To change the table deﬁnition of a FEDERATED table you must drop the table and re-create it. Even with these limitations the FEDERATED engine can be very useful when remote data access is needed. NDB storage engine The NDB storage engine is used to build a MySQL Cluster. MySQL Cluster is a shared-nothing, highly available system. MySQL Cluster is designed to avoid a single point of failure. A cluster will contain one or more SQL nodes (traditional MySQL Servers), one or more data nodes, and one or more management servers. For more information about MySQL Cluster, see Chapter 22. Feature summary: ■ Built to provide high availability with redundant sets of data for node failure scenarios. ■ Transactional support. ■ No foreign key support. ■ Row-level locking. ■ Operates either entirely in memory or supports on-disk storage of data columns. Indexes must always be stored in memory. Beginning with MySQL Server version 5.1.24 the standard binaries built by Sun Microsystems will not include the NDB storage engine. Though it is still available with the source code (for now), the code for the NDB storage engine will not be updated to keep pace with changes in the NDB storage engine. If you need to use the NDB storage engine, use the MySQL Cluster server instead of MySQL Server. Archive storage engine The Archive storage engine is a good option for those needing to store data long term or perform data auditing on application operations. For quite some time MySQL has offered an option of using compressed MyISAM tables. There are performance beneﬁts to using compressed 417
Part III Core MySQL Administration MyISAM tables for read operations, but there are several problems with MyISAM compressed tables: ■ Compressing (or packing) a MyISAM table requires that the database be taken ofﬂine while the table is compressed. This is done through the use of the of the myisampack utility from the shell prompt. ■ Compressed MyISAM tables are read-only. You cannot INSERT, DELETE, or UPDATE on a compressed table. Archive tables are, in several ways, an improvement over these older MyISAM packed tables. The designers of the Archive storage engine were deﬁnitely looking at what was good about compressed MyISAM tables and improving on the areas of weaknesses of compressed MyISAM tables. Feature summary: ■ Non-transactional. ■ Archive tables allow INSERT operations on the table. However, Archive tables do not allow UPDATE or DELETE operations. For data auditing purposes this is ideal because auditing requirements specify that once data has been created it cannot be changed in any manner. ■ Very good data compression factor. The compression factor on an Archive table is higher than a packed MyISAM table. ■ For reading data, much like the InnoDB engine, the Archive engine uses a snapshot read. This ensures that read operations do not block write operations. ■ The Archive engine uses row-level locking. ■ The Archive storage engine supports only one index. However, in tests, the Archive stor- age engine performs better with read queries than MyISAM packed tables, even when the packed tables have indexes the Archive tables do not. Archive tables use up to four ﬁles during operation. All ﬁles begin with a ﬁlename of the table name. The table deﬁnition ﬁle has a sufﬁx of .frm. The data ﬁle has a sufﬁx of .ARZ. The ﬁle with metadata information for the table (if present) has a sufﬁx of .ARM. During table optimiza- tion operations it is also possible that there is a ﬁle with a sufﬁx of .ARN. These ﬁles will all be located in the directory of the database in which the tables are located. The following shows a modiﬁed version of the rental table from the sakila database. Changes were made because Archive tables do not support foreign keys and the rental table had more than one index: mysql> CREATE TABLE `rental_archive` ( -> `rental_id` int(11) NOT NULL AUTO_INCREMENT, -> `rental_date` datetime NOT NULL, -> `inventory_id` mediumint(8) unsigned NOT NULL, -> `customer_id` smallint(5) unsigned NOT NULL, 418
Storage Engines 11 -> `return_date` datetime DEFAULT NULL, -> `staff_id` tinyint(3) unsigned NOT NULL, -> `last_update` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP, -> PRIMARY KEY (`rental_id`) -> ) ENGINE=Archive DEFAULT CHARSET=utf8; Query OK, 0 rows affected (0.01 sec) This creates an archival version of the table. Now to demonstrate the ability to perform INSERT statements but not UPDATE or DELETE statements on an Archive table: mysql> INSERT INTO rental_archive -> (rental_date, inventory_id, customer_id, -> return_date, staff_id) values (NOW(),’1’,’23’,NOW(),’1’); Query OK, 1 row affected (0.00 sec) mysql> SELECT rental_id, rental_date, inventory_id, customer_id -> return_date, staff_id, last_update -> FROM rental_archive\G *************************** 1. row *************************** rental_id: 1 rental_date: 2009-01-16 16:44:41 inventory_id: 1 customer_id: 23 return_date: 2009-01-16 16:44:41 staff_id: 1 last_update: 2009-01-16 16:44:41 1 row in set (0.00 sec) mysql> UPDATE rental SET rental_date=NOW() WHERE rental_id=’1’; ERROR 1031 (HY000): Table storage engine for ’rental_archive’ doesn’t have this option mysql> DELETE FROM rental_archive WHERE rental_id=’1’; ERROR 1031 (HY000): Table storage engine for ’rental_archive’ doesn’t have this option In a nutshell, what is demonstrated here, along with the data compression capabilities, are the two primary reasons for using Archive tables. Blackhole storage engine The Blackhole storage engine does not actually store data in tables as with other storage engines. This might seem to be very counter-intuitive. It is a storage engine that does not store data. But there are uses for this setup. A master server may have an extremely high-write table whose data is only ever used on a slave. For example, session information including where a user clicked may be used in reporting queries, but are never queried on the master server. In this case, the table on the master can be created with ENGINE=BLACKHOLE and the table on the slave can be modiﬁed to use any storage 419
Part III Core MySQL Administration engine. In this way, data updates on the master happen instantaneously, because INSERT, UPDATE, and DELETE statements return immediately as successful (so long as they are valid DML statements according to the table schema). This is a frequently used method of having a high-write table cause almost no extra I/O on the master (the binary logs are still written to). Something quite common with large setups is the use of multiple slaves from a master server. If you have too many slaves this can cause load issues on the master server. You can implement a relay slave that acts as an intermediary to reduce this load. We discuss this in detail in Chapter 16. This relay slave can be conﬁgured using the Blackhole storage engine for all tables being replicated. Even though the Blackhole tables do not store any data changes, the binary logs are still written if logging is enabled. The relay slave operates much more efﬁciently because of the minimization of I/O activity. CSV storage engine The CSV (Comma Separated Value) storage engine is an engine of a decidedly different nature. A CSV data ﬁle is simply a text ﬁle that can be manipulated with a simple text editor or command-line text tools if needed. One of the primary uses for CSV is for data exchange and fast importing. Feature summary: ■ Plain text data ﬁle in CSV format ■ Can have an instantaneous import time ■ Easily imported into programs such as Microsoft Excel ■ Table-level locking ■ No foreign key support ■ Non-transactional ■ Trivial backups and restores (copy the ﬁles) ■ Does not support indexing or partitions Three ﬁles are created with any CSV table: an .frm ﬁle, which contains the table format; a .CSM ﬁle, which contains metadata; and the .CSV ﬁle, which contains the data. Each of these ﬁles has a preﬁx that consists of the table name. Here is a brief example from a snippet of a data ﬁle of a three-ﬁeld CSV table: "1","Overstreet","Joe" "2","Beal", "Sally" "3","Murphy","Ashton" "4","McGhee", "Sam" 420
Storage Engines 11 Want an instantaneous import time? Just have the data that you want to be imported stored in the CSV format in an export ﬁle. Create a new CSV table with the desired ﬁelds and then just copy the export ﬁle to the directory of the database. Then move the ﬁle to table_name.CSV. Issue a FLUSH TABLES command and the data is instantly available. Working with Storage Engines A number of commands are used speciﬁcally to work with any storage engine. These commands are extremely useful in the day-to-day work of a system administrator. CREATE TABLE The CREATE TABLE statement is used to create a database table. You can specify the storage engine that you want to use for the table by using the ENGINE clause or the server will use the default storage engine. For example, to create an InnoDB table: CREATE TABLE innodb_example ( id INTEGER UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT, name VARCHAR(20) NOT NULL ) ENGINE = InnoDB; ALTER TABLE The ALTER TABLE command is used to modify previously created tables. The full syntax for the command is covered in Chapter 4. If you need to change a table type from one storage engine to another the ALTER TABLE command makes this simple. If you executed the previous CREATE TABLE without specifying an ENGINE clause it would create a MyISAM table by default: CREATE TABLE innodb_example ( id INTEGER UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT, name VARCHAR(20) NOT NULL ); After you create the table you realize it needed to be an InnoDB table. To change this you sim- ply run the ALTER TABLE command: mysql> ALTER TABLE innodb_example ENGINE=InnoDB; That is all there is to it! 421
Part III Core MySQL Administration DROP TABLE The DROP TABLE command is used to drop a table and all its contents. Be careful, because this action cannot be undone. If it is executed inside a transaction the DROP TABLE command will implicitly commit the transaction before performing the actual DROP TABLE command. mysql> DROP TABLE innodb_example; Summary The amazing variety of storage engines available for MySQL Server can be a great beneﬁt to the database administrator. It allows you to choose the storage engine that best ﬁts the needs of your application. You even have the choice of having more than one storage engine at the same time in your database, and having a table on a slave server have a different storage engine than the same table on the master server. This range of options brings complexity. There are typically many options to conﬁgure each storage engine. A beginning administrator should concentrate on understanding the MyISAM and InnoDB storage engines because these are going to be the ones you work with most often in a typical MySQL Server environment. Some of the topics that were taught in this chapter include: ■ Pluggable storage engine architecture ■ Storage engine plugins ■ Available storage engines: ■ MyISAM / Merge ■ InnoDB ■ MEMORY ■ Maria ■ Falcon ■ PBXT ■ FEDERATED ■ NDB ■ Archive ■ Blackhole ■ CSV ■ Commands used to work with storage engines 422
Caching with MySQL O ne of the ways of making queries respond faster is to implement caching. A cache stores frequently used information in a place IN THIS CHAPTER where it can be accessed faster. For example, a web browser Implementing Cache Tables like Firefox uses a cache to store the text, images, and other objects from recently visited websites on your hard drive. When you visit a page you Working with the Query Cache have recently been to, the text and images do not have to be downloaded a second time. Another way objects on web pages are cached is through a Utilizing Memcached caching proxy server such as Squid (www.squid-cache.org). A caching proxy server is a proxy between the Internet and a set of machines. It eliminates the redundancy of each machine having a private cache. Furthermore, a cache of web objects from sites that all the machines have recently visited are stored, so a machine can beneﬁt from a proxy cache even if it has never visited a particular web page before. Figure 12-1 shows a simpliﬁed diagram of a network of desktops that utilize a caching proxy server for web pages. When users visit a web page their computer will ﬁrst check the web page cache in their browser as described previously. If the page is not stored locally, the next step is to check the Squid cache. If the web page is not stored in the Squid cache, the web page is downloaded from the actual website — storing objects in the Squid and local browser cache along the way. This may sound complicated, but there is very little overhead from check- ing these caches. When a cache can be used it is much faster than downloading content over the Internet. If you have a hundred people behind a caching proxy, there are going to be many cached web objects that can be accessed by someone else. Everyone behind the web caching proxy beneﬁts from the entire cache. Not only is the user experience faster, but because 423
Part III Core MySQL Administration there is less need to go to the Internet to download a web page, less bandwidth is used. The drawback is sometimes a web page has been updated and a cache has older information. FIGURE 12-1 Internet Caching Proxy Server Like caching in any other application, caching with MySQL is designed to return data from a query quickly. There are three common ways to cache data with MySQL: creating manual cache tables, using internal caching in mysqld, and using the memcached distributed caching system. Implementing Cache Tables Developers and database administrators spend lots of time optimizing queries. However, one aspect of query optimization that is often overlooked is that a simple data access is much faster than a query with calculations, sorting, and aggregations. A cache table is a regular table in a database that stores the results of one or more queries for faster retrieval. Common query results that could be cached in a table are counts, ratings, and summaries. For example, instead of calculating the number of total site visitors every time someone visits a particular web page, you can set an event in the Event Scheduler to calculate the total number of visitors every 30 minutes and store that information in a visitor count table. Then the slower query to calculate count is run once every 30 minutes, and every single time the web page is visited it does a simple query of the visitor count table. For more information on the Event Scheduler and how to create events, see Chapter 7. As an example, assume it takes 4 seconds to calculate the total visitor count and a quarter of a second to return the results from the visitor count cache table. If there are 100 visitors to a site in 30 minutes, it would take 4 seconds per visitor * 100 visitors = 400 seconds of query processing time to run the query for every visitor. However, if you ran the query once every 30 minutes, it would take 4 seconds for the query to run once + 1/4 second to retrieve data from the cache table * 100 visitors to the site = 29 seconds of query processing time. The beneﬁt is obvious. 424
Caching with MySQL 12 The tradeoff is that the total count may not be 100% accurate. However, it is likely good enough — visitor count is usually used to show how many people have visited the site, and the count still adds value even if it is off by a few percentage points. The decision of whether or not the number is good enough has to be made through a joint effort between the database administrator and the business to determine if this is a reasonable tradeoff. There is one other tradeoff of using the Event Scheduler — the query is run every 30 minutes, regardless of whether or not the information is used. For example, if nobody visits the website for an hour, there is no reason to recalculate the visitor count. If this is an issue you need to overcome, you will have to coordinate with application developers as well to create querying code that uses the following logic: ■ Check the age of the data in the cache table ■ If the data age is 30 minutes or less, use the data in the cache table ■ If the data age is greater than 30 minutes, re-run the query, storing the results and new time in the cache table It is preferable to schedule an event in the Event Scheduler, but for resource-heavy queries that are run on data that does not change very often, it may be necessary to create custom applica- tion code instead. Going back to our example using the Event Scheduler — after the decision to store total visitor count every 30 minutes is made, there is a problem. The business has changed its requirements, and tells you that the count needs to be 100% accurate; it is unacceptable to have the query take 4 seconds, and there is no money in the budget for faster hardware. This issue can be solved by storing the total visitor count for a time period in the past and then adding the count of visitors since the last calculation. For example, at midnight daily, calculate the total visitor count for the previous day. This count will never change once the day is over, so the count for an entire day never needs to be recalculated. The running total of visitors can be calculated as well, and then a 100% accurate count can be done at any time by calculating the total visitor count so far today and adding it to the running total calculated once per day. Computing the count for one day is much less resource-intensive than running full count every time someone visits a page. Not only will this allow for accurate counts of total visitors, but it will allow for easy computa- tion of number of visitors in a given data range — in our example, the number of visitors in a week can be calculated by adding the seven daily totals of visitors. The previous method did not easily allow for this type of data manipulation. Here is an example of how to create such a system: CREATE TABLE visitors_today ( today INT UNSIGNED NOT NULL DEFAULT 0, vcount BIGINT UNSIGNED NOT NULL DEFAULT 0 ) ENGINE=InnoDB; 425
Part III Core MySQL Administration CREATE TABLE visitors_stored ( vcount BIGINT UNSIGNED NOT NULL DEFAULT 0, vcount_date DATE NOT NULL PRIMARY KEY ) ENGINE=MyISAM; The visitors_stored table holds a historic record of the count of visitors per day. There can only be one entry per day. The visitors_today table holds the current count of visitors so far today (in the today ﬁeld) and the total count of visitors up to today (in the vcount ﬁeld). With each new visitor, update the visitors_today.count ﬁeld: UPDATE visitors_today SET today = today + 1; To retrieve the current total visitor count: SELECT today + vcount FROM visitors_today; Every day at midnight, schedule an event that: ■ Inserts a new row with yesterday’s date and count in the visitors_stored table ■ Calculates and stores the total running count in visitors_today.vcount ■ Resets the vistors_today.today ﬁeld to zero Here is a sample event that updates the current and historic visitor count tables: CREATE EVENT update_vcount ON SCHEDULE EVERY 1 DAY STARTS ’2009-01-01 00:00:00’ DO BEGIN INSERT INTO visitors_stored (vcount, vcount_date) SELECT today, CURRENT_DATE() - INTERVAL 1 DAY FROM visitors_today; UPDATE visitors_today set today=0, vcount=( SELECT SUM(vcount) as vcount FROM visitors_stored); END Do not forget to enable the Event Scheduler with SET GLOBAL event_scheduler=’ON’, and recall from Chapter 7 that you will need to change the DELIMITER to deﬁne an event with multiple statement events. Also, for the system to function it must be initialized; at minimum you must populate the visitors_today table. If today already had 37 visitors and the total number of visitors at the start of the day was 1,638,492, the following would populate the table properly: INSERT INTO visitors_today (today, vcount) VALUES (37,1638492); For a new system, simply use INSERT INTO visitors_today (today, vcount) VALUES (0,0);. 426
Caching with MySQL 12 These two tables achieve the business goals of allowing for accurate counts and faster count- ing without buying new hardware. For more ways to optimize this and other queries, see Chapter 18. As a ﬁnal example before moving on, another way to keep a counter accurate is through the use of a trigger. If you need to keep an accurate count of the number of rows in a table and it is resource-intensive to run the SELECT COUNT(*) query (as is the case on InnoDB tables), just have a trigger update a count statistic on every INSERT or DELETE from the table. Keep in mind that using triggers brings its own overhead and might not be ideal for your situation. As well, it is useful for simple calculations such as a running counter, but there is not an equivalent useful solution for more complex calculations such as ranking. System-intensive queries, whose results can be slightly inaccurate, are often perfect candidates for cache tables. Some of these queries may be required to use a cache table because the calcula- tions use so many resources! Working with the Query Cache Internally, mysqld can cache the result sets of SQL statements. Similar to the concept of a local cache of a web page, mysqld can compare a query to the queries stored in the query cache. If the query is stored in the query cache, the result set is retrieved without having to execute the query again. If the query is not stored in the query cache, the query is executed and the result set can be stored in the query cache so the next time the query is called the result set will be there. By default, the query cache is not enabled. Using the query cache adds more overhead — memory is used to store the query cache, and it takes processing time to check the query cache. If the query cache is frequently checked and matches are rarely found, the additional processing time required may hurt performance instead of helping it. Make sure to thoroughly test the query cache before implementing it. In some cases, turning the query cache on will produce amazing performance improvements. However, it is a common mistake for database administrators new to MySQL to believe that the query cache will solve most performance issues. In reality, only queries with certain characteris- tics will beneﬁt from using the query cache. This section discusses how the query cache works, to better provide you with information to help you decide when the query cache is appropriate for you. What gets stored in the query cache? The MySQL query cache is simple in operation. It only caches SELECT statements and their corresponding result sets, and only for deterministic SELECT statements. Deterministic SELECT statements are statements that always produce the same results no matter what, given the same 427
Part III Core MySQL Administration data set. For example, SELECT COUNT(*) FROM tbl is always the same given the same data set, but SELECT NOW() is not always the same. The query cache stores a rudimentary hash of the query along with its result set. This means that the query being compared must be byte-for-byte identical to the cached query — the queries themselves are not compared; their hashes are compared. This means that in order to match, the queries must match with regard to case sensitivity and whitespace. Consider the following two queries: SELECT customer_id,name from sakila.customer; SELECT customer_id,name FROM sakila.customer; Both queries executed from the command line will return identical result sets. However, from the perspective of the query cache, both queries are not identical. With the ﬁrst query from is not capitalized and with the second query FROM is capitalized. From the perspective of the query cache these are not the same query. As mentioned previously, different spacing in the queries will also result in no match being found. In addition to spacing and capitalization, queries are viewed as different by the query cache if they use different databases, protocol versions, or character sets. Furthermore, not all SELECT statements will be cached. SELECT statements with the following properties are not cached: ■ Use of non-deterministic functions ■ Most subqueries ■ Use of user-deﬁned functions ■ Use of temporary tables ■ SELECT statements in stored functions, triggers, views, and events ■ Using LOCK IN SHARE MODE or FOR UPDATE ■ Result sets larger than query_cache_limit (1 Mb by default; see the section in this chapter on query cache conﬁguration) ■ Using SQL_NO_CACHE Just because a query can be cached, however, does not mean it should be cached. If a query is being executed frequently and does not have a result set that exceeds the query_cache_limit, it may be a good candidate for caching. However, another factor to consider is how many times a cached query is used before it is removed from the query cache. Query cache invalidation is when a query is removed from the query cache because its result set may have changed. When a table is modiﬁed by either DDL or DML (either schema changes or data changes), all of the queries in the query cache referencing that table are invalidated and removed. This query cache invalidation is not very granular — it invalidates query cache results based on tables, not rows. A query may be removed from the query cache even though its result set has not changed, because a table it references has changed. 428
Caching with MySQL 12 Query cache pruning is when a query is removed from the query cache because the query cache cannot ﬁnd a free block in which to store a new query. The query cache prunes older entries using a least recently used (LRU) algorithm. The overhead of query removal must be considered when determining if a query is a good candidate for caching. Queries that reference tables that are frequently updated and queries that are pruned before being used several times are not good candidates for the query cache. In addition, if there are many queries that can be cached, enabling the query cache for all queries is not a good idea — we have seen many database administrators turn on the query cache, and query performance gets worse, because every single query is being cached, and there are so many queries being cached that the savings of using a cache are never recognized. When using the query cache, each SELECT statement has overhead. First, a hash of the query is calculated, and the query cache is checked for a match. If the query does not match, the query proceeds as usual. If the query can be cached, the result set is stored in the query cache when the query is completed. All queries that do not have a match in the query cache will use more resources than if the query cache was turned off. However, DML such as UPDATE, INSERT, and DELETE and DDL such as ALTER TABLE also have extra overhead even though these statements are never cached. When a table is changed, the cache is checked in order to remove any queries that are invalidated. If you realize you do not want to store every query in the query cache, you can use the server variable query_cache_type. It has three possible options: OFF, ON, and DEMAND. Setting the query_cache_type to OFF turns off the query cache. If you do not want to cache every result set, you have two options: ■ Cache all queries, except for a few chosen queries. Setting the query_cache_type to ON and setting a query_cache_size larger than 0 enables the query cache for SELECT statements as speciﬁed by the rules above. This means the query cache will cache all possible queries, and you can choose certain queries not to cache by using SELECT SQL_NO_CACHE. ■ Do not cache any queries, except for a few chosen queries. Setting the query_cache_ type to DEMAND and setting a query_cache_size larger than 0 enables the query cache only for SELECT statements with the SQL_CACHE clause. In other words, the query cache does not cache any queries except for the ones you choose, which will use SELECT SQL_CACHE. Query cache memory usage and tuning The query cache for the server is stored completely in memory, and does not persist across mysqld restarts. Aside from about 40k of memory used by the query cache, the entire pool of memory is initialized as a single contiguous chunk of memory. During query cache operation this chunk of memory is divided up into smaller blocks. These blocks are used to store the result sets of the cached queries. Each block must be at least the size speciﬁed by query_cache_min_res_unit, which defaults to 4 Kb. For result sets smaller than 4 Kb, it still uses 4 Kb of memory. 429
Part III Core MySQL Administration In addition to these blocks containing the result sets of the queries there are two hash tables. One hash table stores query hashes. The second hash table contains the list of database tables that each cached query references. When a table is modiﬁed, this second hash table is consulted to determine which queries should be removed due to query cache invalidation. Seven system variables are used to control the conﬁguration and memory usage of the query cache. The values of these can be seen by using SHOW GLOBAL VARIABLES statement or in the GLOBAL_VARIABLES system view in the INFORMATION_SCHEMA (see Chapter 21 for more information about INFORMATION_SCHEMA). Table 12-1 lists these variables and a brief description of each one. TABLE 12-1 Query Cache System Variables Variable Name Description query_alloc_block_size Speciﬁes the size of memory blocks allocated by the query cache during query processing. The default is 8 Kb. query_cache_limit Result sets larger than this are not cached. The default is 1 Mb. query_cache_min_res_unit Speciﬁes the minimum size for blocks in the query cache. The default value is 4 Kb. query_cache_size The total memory allocated by mysqld for the query cache. The default is 0, which disables the query cache. This can be changed dynamically — changes do not require a mysqld restart. query_cache_type Determines the mode of operation of the query cache. Options are OFF, ON, and DEMAND. The default is ON. query_cache_wlock_invalidate If set to TRUE, queries referencing MyISAM tables are invalidated when a write lock for that table is obtained, even if none of the data is changed when the write lock is released. The default is FALSE. query_cache_prealloc_size Speciﬁes the size of the buffer used for query parsing by the cache. Defaults to 8 Kb. There are eight status variables relating to the query cache. These variables are used for man- aging and tuning the query cache. The values of these can be seen by using the SHOW GLOBAL STATUS statement or in the GLOBAL_STATUS system view in the INFORMATION_SCHEMA (see Chapter 21 for more information about INFORMATION_SCHEMA). 430
Caching with MySQL 12 Table 12-2 lists each query cache status variable with a brief description. TABLE 12-2 Query Cache Status Variables Counter Name Description Qcache_free_blocks Number of memory blocks free in the query cache Qcache_free_memory Total bytes of memory free in the query cache Qcache_hits Number of times a query matched the query cache Qcache_inserts Number of times a query and result set were inserted into the query cache Qcache_lowmem_prunes Number of times a query was removed due to query pruning Qcache_not_cached Number of queries that could not be cached Qcache_queries_in_cache Number of queries currently stored in the query cache Qcache_total_blocks Total number of memory blocks in cache To see how useful your query cache is: ■ Determine the percentage of SELECT queries that are getting their result sets from the query cache. This rate, known as the query cache hit ratio, is calculated by taking the number of query cache hits (matches) and dividing it by the total number of hits plus the number of non-hits. Com_select is the status variable that holds the number of executed SELECT queries that go through the full execution plan. To calculate the query cache hit ratio: Qcache_hits / (Qcache_hits + Com_select) For example, if Qcache_hits is 15,593 and Com_select is 10,193: 15,593 / (15,593 + 10,193) = .60470 = 60.47% Depending on your application, this can be a good hit ratio or not. If your application has many different SELECT statements, or if many statements cannot be cached or are not being cached on purpose (that is, using SELECT SQL_NO_CACHE), the hit ratio will never be high. To increase your hit ratio, consider caching more queries (see the sidebar on caching more queries). 431
Part III Core MySQL Administration ■ Calculate the percentage of SELECT queries that are placed in the query cache. This rate is known as the insert ratio and, like the hit ratio, is calculated using the Com_select status variable: Qcache_inserts / Com_select * 100 If this percentage is high, it means that most of your SELECT queries are being put into the query cache. If Qcache_lowmem_prunes is also low, it means that even though most of your SELECT queries are being put into the query cache, they are not causing your query cache to ﬁll up, thus pruning queries from the query cache. However, this may not necessarily indicate a well-tuned query cache, because there could be a lot of query cache invalidation happening. Unfortunately there is no parameter to see exactly how much query cache invalidation is happening. You may be able to estimate how many queries are being removed by watching how the Qcache_queries_in_cache variable changes over time. ■ Calculate the percentage of free memory, in bytes and blocks, left in the query cache: Qcache_free_memory / query_cache_size * 100 Qcache_free_blocks / Qcache_total_blocks * 100 If the percentage of free memory is high, either lower the query_cache_size so that you are not wasting memory by allocating it to the query cache, or cache more queries (see the sidebar on caching more queries). If the percentage of free memory is low and Qcache_queries_in_cache is also low, it means that in addition not having a lot of memory for more queries, there are very few queries in the query cache. This may indicate that: ■ The query_cache_size needs to bigger ■ The query_cache_limit needs to be smaller ■ The query cache is heavily fragmented. See the next section on query cache fragmen- tation for more information. Caching More Queries f you determine that your query cache is not being utilized enough, you may need to try caching I more queries. To do this, you can: 1. Use fewer SELECT SQL_NO_CACHE statements if your query_cache_type is ON. 2. Use more SELECT SQL_CACHE statements if your query_cache_type is DEMAND. continued 432
Caching with MySQL 12 continued 3. Qcache_not_cached is the number of queries that could not be cached. If this is large, increasing the query_cache_limit parameter enables queries with larger result sets to be stored in the query cache. 4. Qcache_lowmem_prunes is the number of times a query has had to be pruned due to not enough memory. If Qcache_free_blocks and Qcache_free_memory are also high, you likely have fragmentation (see the next section on query cache fragmentation). If the free memory in the query cache is low, increasing the query_cache_size parameter will allow more queries to be cached in the query cache. Query cache fragmentation Fragmentation of the query cache is analogous to fragmentation of hard drives. With a typical desktop computer, after you utilize it for a while your hard drive gets fragmented with ﬁles scat- tered all over the hard drive. The fragmentation is caused by ﬁles being deleted that leave holes in the layout of ﬁles on your hard drive. Though the operating system will try and ﬁll in these holes with other ﬁles, they will almost never be an exact ﬁt. Over time you end up with hun- dreds or thousands of these holes and they consume extra hard drive space and slow down the system. To resolve the problem the hard drive must be defragmented. This involves moving the ﬁles around to remove the gaps and make everything as compact as possible. With a query cache there are two causes of fragmentation. The ﬁrst cause is that the result sets do not always ﬁt exactly into a block of memory. The query cache program has to estimate the size of the result set it will be storing in the cache and allocate a block of memory for storage of the result set. The estimation is necessary because it receives the result set row by row instead of in one large chunk. Once the query cache is done storing the result set it will trim the allocated block of memory down in size so that there is just enough memory for the size of the result set. This leaves a gap between the blocks of memory. The second cause of query cache fragmentation is the invalidation of queries stored in the cache. When a query is invalidated it is deleted from the cache. As with hard drive fragmentation, a hole is created when the deletion occurs. Often both trimming of blocks and invalidation of queries leads to blocks of memory that are too small to store query results. The best way to handle query cache fragmentation is to minimize the amount of fragmentation. One way to minimize fragmentation is to set the block size of the query cache to the average result set size of your queries. To determine the average result size: (query_cache_size-Qcache_free_memory)/Qcache_queries_in_cache Unfortunately this will not help for every application. Situations where this might not be helpful include when the application has a combination of both large and small queries. 433
Part III Core MySQL Administration Another way to minimize fragmentation is to minimize the query cache invalidation and prun- ing. The more queries removed, the more fragmentation can occur. Often, it is not possible to change the frequency of UPDATE, INSERT, and DELETE statements on a table. However, query cache invalidation may be minimized by splitting large tables into smaller tables by using MERGE tables (see Chapter 11) or partitioning (see Chapter 15). In situations where fragmentation is likely and difﬁcult to minimize, it may be best to choose which queries to cache by setting the query_cache_type to DEMAND and using the SQL_CACHE hint to select the queries you want cached. The previous section on query cache memory usage and tuning showed how to determine if the query cache is fragmented. Executing the FLUSH QUERY CACHE command will defragment the cache. FLUSH QUERY CACHE does not ﬂush the cache; it compacts the cache by lock- ing the query cache and re-ordering it so there are no gaps between blocks of memory. With larger caches the re-ordering can be a signiﬁcant period of time. At the time of this writing, we recommend the maximum size of query_cache_size to be 256 Mb, so the defragmentation process is not too long. If, instead of compacting the query cache, it is necessary to empty the query cache use the RESET QUERY CACHE command. Be careful using the FLUSH QUERY CACHE command. DML and DDL use the query cache to invalidate queries after the table changes are made, so these queries will not complete until the lock on the query cache is released. In addition, any SELECT queries checking the query cache must also wait for the lock on the query cache to be released. If the query_cache_type is ON, this means that all SELECT queries except for those specifying SQL_NO_CACHE depend on the query cache. Performing a FLUSH QUERY CACHE may block many or all queries, so be careful when defragmenting! Utilizing memcached The memcached caching system is a distributed, in-memory cache system that speeds up web applications by minimizing the amount of database queries needed. It was originally developed by Danga Interactive for the LiveJournal website. The ofﬁcial website for memcached is at http://danga.com/memcached/. The basic principle of memcached is that a number of memcached daemons work toge- ther to provide data caching. memcached is distributed because many daemons can work together to cache data — there is no need for one centralized cache. It is an in-memory cache, which means that when a daemon is restarted the cache data it held is lost. As with the query cache in mysqld, memcached stores a hash of a query along with its result set. The application code must be changed to ﬁrst retrieve the result set corresponding to the hash of the query — if nothing is returned, there was no match in the memcached cache, and the application code should proceed to query the database as normal. The application code has the power to store the query and the new result set, if desired. 434
Caching with MySQL 12 Installing memcached is not difﬁcult for those with system administration experience. mem- cached runs on Unix-based platforms. A third-party Windows port of memcached is available at http://jehiah.cz/projects/memcached-win32. The only prerequisite memcached requires is the libevent library. The Frequently Asked Questions page for memcached has a lot of good information for a memcached beginner, including where to download memcached and how to install it: http://code.google.com/p/memcached/wiki/FAQ Conﬁguration of memcached is not difﬁcult. In fact, it has just a handful of command-line options. Table 12-3 lists the memcached options and what they do. TABLE 12-3 memcached Command-Line Options Option Description -d Runs in daemon mode (in the background). Without this option, memcached runs in the foreground. -l Listens on speciﬁc IP address. By default, memcached listens on all IP addresses conﬁgured on the server. -s Listens via the speciﬁed Unix socket ﬁle only. -m Amount of memory to use (in megabytes). -p Listens on speciﬁed port (11211 by default). -u User who runs the daemon. One consideration is the amount of memory you can allocate to each memcached daemon. If you are running memcached on a 32-bit machine, each process can only address approximately 2.4 Gb of memory. However, because memcached is designed to be distributed, it is very easy to run multiple instances on 32-bit machines. In fact, to maximize the cache memcached pro- vides, run as many memcached daemons as possible — using dedicated memcached machines with one or more instances, and putting memcached instances on the same machines as their web servers. Both of these practices are recommended and encouraged. Web servers are particularly well-suited to share resources with memcached processes. For typi- cal dynamic content, web servers use much more CPU than memory, and memcached processes use much more memory than CPU. 435
Part III Core MySQL Administration No authentication or security measures are built into memcached. Never expose memcached ports to the Internet. Once memcached is conﬁgured and running, it is time to begin utilizing the cache. You do this via application code. In each place that your application queries the database, add in a check of the cache in memcached ﬁrst. If memcached returns nothing, the application must then go to the database, get the result set, and store it in memcached. A number of programming APIs are available that interface with memcached. Table 12-4 lists some of the more popular ones and their locations. TABLE 12-4 memcached Programming API Libraries API Language API Name Download Location C libmemcachedapr_ http://tangent.org/552/libmemcached memcache .html http://www.outoforder.cc/ projects/libs/apr_memcache/ C# memcacheddotnetEnyim https://sourceforge.net/projects/ Memcachedbeitmemcached memcacheddotnet/ http://www.codeplex.com/ EnyimMemcached/ http://code.google.com/p/ beitmemcached/ Java spymemcached http://code.google.com/p/ spymemcached/ Lua memcached.lua http://luamemcached.luaforge.net/ Perl Cache::Memcached http://search.cpan.org/dist/ Cache::Memcached:: Cache-Memcached/ Fast http://search.cpan.org/dist/ Cache-Memcached-Fast/ PHP memcache http://pecl.php.net/package/ memcache Python python-memcached ftp://ftp.tummy.com/pub/ python-memcached/ MySQL UDFs Memcached Functions http://tangent.org/586/Memcached_ for MySQL Functions_for_MySQL.html Though we do not have the space to provide a complete programming guide, we will show a simple example to show you what it is like to work with memcached. 436