Ten essential tools for MySQL admins
Free and open source tools to monitor and extend your MySQL performance
By Daniel Nichter | InfoWorld | Published: 14:08, 01 August 2011
"Data drift" is a significant problem for dynamic MySQL environments. This problem, wherein slave data becomes out of sync with the master, is often caused by writing data to a slave or executing certain non-deterministic queries on the master. What's worse is that the data differences may go unnoticed until they become crippling.
Enter mk-table-checksum, a tool that performs the complex, sensitive calculations necessary to verify the data in two or more tables is identical.
mk-table-checksum works with both standalone servers and servers in a replication hierarchy, where the tool's greatest value is easily seen. Verifying table data between a master and a slave must account for replication consistency. Because changes to the master are replicating to slaves with some amount of time delay ("lag"), simply reading data from the servers is an unreliable way to verify consistency, given that the data is constantly changing and incomplete until fully replicated.
Locking tables and waiting for all data to replicate would allow consistent reads, but to do so would mean effectively halting the servers. mk-table-checksum allows you to perform non-blocking, consistent checksums of master and slave data. For technical details on how this is accomplished, see the tool's documentation.
Apart from replication consistency, there are other problems with verifying data. Table size is one of them. The MySQL command CHECKSUM TABLE is sufficient for small tables, but large tables require "chunking" to avoid long locks or overloading CPU or memory resources with checksum calculations.
Chunking solves a second problem, the need for regular data consistency checks. While data drift can be a one time occurrence, often it is recurring. mk-table-checksum is designed to continuously check tables, vetting certain chunks one run and other chunks the next run until eventually the whole table has been checked. The ongoing nature of this process helps ensure that recurring drift is corrected.
Problems have a way of waiting until you're not looking or at home sleeping to occur, and diagnosing them after the fact is sometimes impossible without data about the state of MySQL and the server at the time of the problem. The natural inclination is to write your own script to wait for or detect a problem and then start logging extra data because, after all, no one knows your system better than you. The problem is, you know your system when it's working, and if you knew the kinds of problems the system would have, you would simply fix them rather than try to capture and analyse them.
Thankfully, those who specialise in knowing when MySQL is not working, and in fixing the problems, have written a duo of tools called stalk and collect. The first tool waits for certain conditions to become true before running an instance of the second tool. That seems trivial, but these tools are made efficient by certain details addressed.
Firstly, stalk runs collect in configurable intervals, keeping you from logging too much redundant data, which can obfuscate postproblem analysis. Secondly, collect gathers not only the standard information that MySQL can report about itself but a lot more data that you might not have thought to include: lsof, strace, tcpdump and so on. Thus, if you end up having to consult with a professional who specialises in fixing MySQL problems, you will have all the data that they need.
stalk and collect are configurable, so they can be used for almost any problem. The one requirement is a definable condition to establish a trigger for stalk. If multiple conditions signal the problem, then you may also need to consult with a professional for a more extensive review of your MySQL environment because problems can appear in MySQL even though the underlying cause is elsewhere.
stalk and collect can be used proactively, too. For example, if you know that there should never be more than 50 active MySQL connections at a time, then you could proactively monitor this stalk, making these tools helpful both for problems that you know and problems that you have not yet seen.
Maintainer: Baron Schwartz
You don't always want to wait for something to go wrong before addressing a problem, and dashboards provide an essential way for you to monitor your MySQL environment for potential problems before they arise.
There are many free and commercial monitoring applications for MySQL, some MySQL-specific and others generic with MySQL plugins or templates. mycheckpoint is notable because it is free, open source, MySQL-specific and full featured.
mycheckpoint can be configured to monitor both MySQL and server metrics, like InnoDB buffer pool flushes, temporary tables created, operating system load, memory usage and more. If you don't like charts, mycheckpoint can also generate human-readable reports.
As with stalk, alert conditions can be defined with email notifications, but no secondary tool like collect will be run to log additional troubleshooting data. Another useful feature is mycheckpoint's ability to monitor MySQL variables to detect changes that can lead to problems, or signal that someone has modified MySQL when they shouldn't have.
Monitoring MySQL isn't just for data centres or large deployments. Even if you have a single MySQL server, monitoring is essential. As with your vehicle, there's a lot to know about the system while it's running to help you foresee or avoid malfunctions. mycheckpoint is one solution among many worth trying.
Maintainer: Shlomi Noach
More info: http://code.openark.org/forge/mycheckpoint
Queries against partitioned or sharded data sets can be accelerated dramatically using shard-query, which parallelises certain queries behind the scenes. Queries that use the following constructs can benefit from shard-query's parallel execution:
- Subqueries in the FROM clause
- UNION and UNION ALL
Aggregate functions SUM, COUNT, MIN and MAX can be used with those constructs, too.
shard-query is not a standalone tool, it requires other programs like Gearman, and it's relatively complex to set up. But if your data is partitioned and your queries use any of the constructs listed above, then the benefits are worth the effort.
Download: (svn checkout) http://code.google.com/p/shard-query/source/checkout
Maintainer: Justin Swanhart
More info: http://code.google.com/p/shard-query/
As tables become larger, queries against them can become slower. Many factors influence response times, but if you have optimised everything else and the only remaining suspect is a very large table, then archiving rows from that table can restore fast query response times.
Unless the table is unimportant, you should not brazenly delete rows. Archiving requires finesse to ensure that data is not lost, that the table isn't excessively locked, and that the archiving process does not overload MySQL or the sever. The goal is an archiving process that is reliable and unnoticeable except for the beneficial effect of reducing query times. mk-archiver achieves all this.
mk-archiver has two fundamental requirements, the first of which is that archivable rows must be identifiable. For example, if the table has a date column and you know that only the last N years of data are needed, then rows with dates older than N years ago can be archived. Moreover, a unique index must exist to help mk-archiver identify archivable rows without scanning the entire table. Scanning a large table is costly, so an index and specific SELECT statements are used to avoid table scans.
In practice, mk-archiver automatically handles the technical details. All you have to do is tell it what table to archive, how to identify archivable rows, and where to archive those rows. These rows can be purged, copied to another table or written to a dump file for future restoration if needed. Once you're comfortable with the tool, there are many options to fine-tune the archiving process. Also, mk-archiver is pluggable, so it can be used to solve complex archiving needs without patching the code.
When was the last time you audited the security of your MySQL servers? You're not alone if "never" is the answer. There are many companies that provide security audits, but unless nothing ever changes after those audits, then the security of your MySQL environment should be checked regularly.
External threats are one obvious reason to enforce MySQL security, but internal threats like current or former employees are often more dangerous because they are (or were) trusted. Security is also important for enforcing privacy, preventing accidental access (for example, logging into the production server instead of the development server) or enabling third party programs to interact with your systems.
For those looking to increase the security of their deployments, oak-security-audit is a worthwhile, free and open source tool that performs basic MySQL security audits. It doesn't require any setup. Just run it against your MySQL servers, and it prints a report with risks and recommendations about accounts, account privileges, passwords and some general best practices, like disabling network access.
oak-security-audit focuses just on MySQL security, so it's not a replacement for a full system security audit by a human, but it's a great first line of defence that is easy to use. You could run it weekly with cron and have the reports emailed to you.