I've been working for some weeks now to optimize a client's MySQL databases. I consider myself an apprentice MySQL DBA (e.g. there's much room for me to improve :P ), and I've found the process to be difficult. Although there is a whole lot of information about what to do and how -- some blog posts out there are absolutely invaluable -- the settings are not always straightforward to understand, since they often have many impacts and to correctly modify them, you need to understand what they do and how they change things under the hood for MySQL.
After bumping table_cache and tmp_table_size / max_heap_table_size up, I was left with one weirdness: the table cache is marked as being nearly empty (good) but there were a lot of query cache prunes (bad). This situation actually makes the cache add latency for most cases (in other words, it slows down tremendously your database).
Reducing query_cache_min_res_unit for less cache fragmentation
I've found this blog post which talks about experimenting with decreasing the default cache block size from the default value of 4K to something closer to the average query result size:
I made the calculations that are done in this blog post and I had approximately the same average result size (around 2.2K), so I intend to try and apply the same two-step modification to the value of query_cache_min_res_unit.
To actually go further, I installed the mysql cacti templates from http://code.google.com/p/mysql-cacti-templates/ in order to have some real followup of performance. I set up a bunch of graphs for general purpose values, and I added the three following graphs: Query cache, Query cache memory and Temporary objects. I think with the data that those graphs will provide, I'll be able to take a much more enlightened decision about the query cache.
Is the query cache useful at all?
I've also found this interesting page questioning the usefulness of the query cache in most cases (read the comments, there's a bunch of good points that people put up):
To sum it all up:
- The query cache in MySQL adds overhead to each request, since you have to verify whether your request is cached or not, and if not, once you get the results MySQL will try and store that in the cache (if possible).
- The query cache can be useful to save a lot of processing time if used wisely: for queries repeated often on tables that seldom change.
- In most cases, you won't be able to use the cache appropriately, so the query cache just gets in your way.
If you're not able to use the query cache appropriately for it to save your server processing time, then it's considered better to turn it off altogether.
I'll have to verify the percentage of cache hits over the next two weeks or so to judge if the cache is useful or not.
actually not so much / not all the times...
by definition, a cache is a temporary store that is kept in memory to make accessing a data set faster.
Most of the times, you'll see caches being used to prevent a process from reading data from disk repeatedly. And this use case has been the most popular, since accessing a disk is very slow, whereas accessing RAM is fast.
From what I could understand about the query cache, the use case it covers is actually different: it is a datastore that keeps the results of SQL queries readily accessible. The use case differs in that tables accessed by the query may actually already be loaded into RAM. What MySQL is trying to save here is all of the processing about joining tables and subqueries, calculating stuff from function calls, and filtering out results.
But such a cache is potentially not consistent anymore if one of the tables touched by the query gets updated, so the cache needs to be dropped on UPDATE or INSERT operations. Hence the point about the feature being useful only for "read-often, write-seldom" tables.
heh, that's interesting to see that you could get it to be useful.
actually, the query_cache is probably most useful on servers with a limited number of sites for which you know you have the right use case.
When writing the post, I was trying to fine tune the query_cache on a server that's doing shared hosting. So, there's a whole lot of "who-knows-what" in there and it's practically not possible to identify how the database is used.
Also, since there is a big amount of databases in there, trying to use optimizations such as query_cache is not really feasible since I guess every one would be fighting for a little space in the cache all the time, thus making someone else's cached queries to go to the trash.
I haven't taken out the query_cache on the shared hosting database I've mentioned, yet. I'll try and figure out what improvements I should be obtaining in the graphs before doing so.