Skip navigation.
Home
The She-BA

“kill” Oddness

So, at midnight I got a call from customer service saying our site was slow. I narrowed it down to one of our auxiliary databases, that seems to have gotten wedged just about midnight. Normal queries that took less than 4 seconds started taking longer and longer, moving up to 5 seconds and past 30 seconds in the span of a minute or so.

In the moment, I thought killing off all the queries would be a good move. My kill script, which consists of:


for i in `/usr/bin/mysql -u user -pPass -e 'show full processlist' | grep appuser | cut -f1`
do
mysql -u user -pPass -e "kill $i"
done

This will attempt to kill any mysql connection owned by the appuser. I used it a few times, and it didn't work. So I used a trick I learned when we bring our site down -- sometimes there are straggling connections to mysql, so what I do is change the app user's password by direct manipulation of the mysql.user table and flush privileges.

Within 10 seconds, all the connections from the appuser were gone, and when I put the correct password back and flushed privileges, everything came back normal. Queries started taking their usual amount of time.

Why is it that queries that refused to be killed by "kill", and yet changing the password for the user they were running as killed them off? Some were running more than 45 seconds, in various states of "Sending data" and "closing tables". Nothing was running for much longer than 60 seconds, so it doesn't seem like there was a big query that was wedging things.

Oh, and what happened at midnight to cause this? No clue, the only thing we run at that time is a PURGE LOGS FROM MASTER, which we do every hour, as we fill up a 1.1G binary log file every 20 minutes or so. This database holds a particularly heavy write application and also runs reports, so we optimize the tables every week (wednesday at 2 am). I've put the optimization to daily, as when I ran it manually this morning it took about 20 seconds.

Anyone have an idea about why changing the password worked so quickly when kill did not?

[...] Sheeri Kritzer, the

[...] Sheeri Kritzer, the MySQL She-BA, has a quick question: does anyone have any ways they create an API for their stored routines (functions and procedures)? She also has some concerns about how “kill” behaves with MySQL processes. [...]

David, I appreciate the

David,

I appreciate the advice -- there are very few queries that run, they just run frequently. They have been running for months without issue, and in fact have been running since the incident without a problem. The queries that were running for 10 seconds or more were actually read queries, and their explain plans didn't show anything odd. As well, it wasn't any one particular query.

I completely agree that changing the password shouldn't do anything to running connections, and that killing should have worked. I suspect that having no new connections allowed things to clear up enough, and that in that 10 seconds every query finished, and no new ones came about (which wouldn't happen with a kill statement, because new ones would still be coming in).

The queries are 3rd party software, and having a developer change them is a very long process, which sadly management would rather upgrade the hardware (which it does need, it's running an older OS, etc) because it's "easier".

This seems pretty strange,

This seems pretty strange, I'd love to do a test, but I suspect that changing the password doesn't do anything to current running MySQL Processes. In fact, in my testing it doesn't affect running connections. However, because no new connections are being made it can drastically reduce contention especially during events like FLUSH's. However, it doesn't seem like PURGE of the logs should cause that issue.

I'm also not certain why the script you'd written to kill processes wouldn't succeed In general, I try to avoid killing any write-oriented queries as it seems as though they can create contention issues of their own when rolling back. I guess my only other question would be: does the user with which you were executing the KILL's have rights to kill the processes?

It seems as though your server was in a state of contention, whether it be for locks or I/O. It's usually desirable to take any of the queries that were running for 10+ seconds and analyze their query plan. There may be opportunities for these queries to be more efficient and less contentious.

Well, when we tell the load

Well, when we tell the load balancer "stop serving connections from the world to the webservers" and most of the webserver connections to the db stop, but not all, I could shut down the db, or I could keep the db's uptime intact by changing the password. That way any straggling machines (say, a reporting server not behind the load balancer) can't actually reach the db anyway, because they get permission denied.

I didn't know about that

I didn't know about that trick. Bizarre.