Smart code
So, the other day I was asked by a developer to come up with a table to tally votes. Basically, there are 6 choices, and customers can vote once per day. I asked if there were any other constraints, and there were none. I specifically asked if they wanted once per calendar date, or 'it has to be 24 hours since the last vote'; they wanted 'once per calendar date'. And updating the current day's vote is not allowed. Once the vote is in, it cannot be changed.
So I came up with a simple table:
CREATE TABLE `ManOfMonth` (
`uid` int(10) unsigned NOT NULL default '0',
`voteDate` date NOT NULL default '0000-00-00',
`uidVoteFor` int(10) unsigned NOT NULL default '0',
PRIMARY KEY (`uid`,`voteDate`),
KEY `countVotes` (`uidVoteFor`)
)
There's no need for a timestamp, and you can select now() into the voteDate part, and MySQL truncates the data for you (we're using 4.1.12).
The smart part of this is that the primary key is the constraint on the database -- that is, a userid cannot vote more than once per day. So I told the developer, "When you're writing the code, don't bother running a select to see if they've already voted; just do the insert query, and check if there ends up being an error for a duplicate key."
Using the MySQL error messages in this way prevents unnecessary selects. This kind of coding should be common practice; I have a feeling it is not, particularly since I had to explain to the developer why the select was unnecessary. Then again, the developer also had wanted a table with uidVoteFor and the tally as 2 columns. This would be a smaller table, sure, and getting the result would be easy, but then there's still the accounting for whether or not the person had voted that day. The index on uidVoteFor helps make the tally query faster.

The She-BA
Hi blackflags, REPLACE INTO
Hi blackflags,
REPLACE INTO would mean that a person could change their vote. And the end of the first paragraph specifically states "Once the vote is in, it cannot be changed." Otherwise, yes, we would have used REPLACE INTO.
Hi guys. Why don't you
Hi guys.
Why don't you mention the very powerful "REPLACE INTO" MySQL specific instruction that does all the job...
REPLACE INTO ManOfMonth SET uid=1,voteDate=now(),uidVoteFor=2;
This will not trigger any error and mysql_affected_rows() will return 1 if the row was just inserted, or 2 if the existing row has been deleted and inserted again.
Okay, Sheeri, the user
Okay, Sheeri, the user experience is an argument that would really count (but see below). It's just funny that you now have to give it and not D Rickard. ;-) As I in my job usually have to deal with automatically generated or parsed data from text files, that doesn't apply there.
But then of course this SELECT is actually another thing: You use it before you present the voting page. That's fine. But after that for the actual vote operation everything said above still holds. You have to check for multiple votes from the same person again, because if it's a web based application somebody could fake the POST data from your form (and there's always somebody how tries to do that, for sure). You can still do that with a single UPDATE instead of an additional SELECT before the UPDATE there. So you're still right with what you told the developer.
Thanx for the support, Beat!
Thanx for the support, Beat! When talking about it yesterday, we actually realized that this was a bad idea from the user experience as well as deviating from the standard (thou shalt not raise an error unless it's really an error).
That is, no matter whether a customer had voted or not, they get a screen that seems to allow them to vote. Then when they vote, if they've already voted, they get an error. It's misleading to give them a poll screen if they cannot vote, so we went with a SELECT instead of using an UPDATE and throwing an error.
I don't want to start a
I don't want to start a flame war on this, as it's probably a matter of taste, too. ;-)
But hey, if somebody wants to vote multiple times, isn't this an error condition?
And it's just my philosophy that I like to enforce data integrity as closely connected to my actual data as it's only possible. That's the same philosophy as in OOP where you bundle data and the operations done on it.
Then you get an additional advantage: The single INSERT is by definition atomic, with an additional SELECT however you have to use transactions or do a LOCK TABLE (if you use MyISAM) to ensure that there's no other conflicting INSERT between your SELECTs and INSERTs (I don't even want to start talking about the overhead consequences of that).
I don't know too much on the actual overhead that's involved in throwing an exception from JDBC, as I'm more the Perl than the Java guy. But I hope it won't be too bad, as exceptions are a pretty common thing to use in Java as I understand it.
For the stored procedures: Yes, it's of course possible to return a value from a stored procedure in MySQL and then to SELECT that afterwards. You just end up with another query and its overhead.
But of course it's okay to use your approach as well: I just wanted to give you some arguments for Sheeri's. And I think there are quite a few if you look at this thread, and they are not too bad, are they?
It's elegant to cause an
It's elegant to cause an error?
If you're using Java/JDBC to connect to MySQL, isn't your solution going to throw an exception (in Java) when the PK is violated by your insert? If that is the case, I would still have to disagree with your approach.
I don't have a lot of experience with MySQL stored procedures, but if you took the sp approach, couldn't you set a variable to indicate whether you did the insert or not, then select that variable as the return value/resultset/out parameter?
I know there is more code when doing it the way I proposed, but it seems that intentionally causing an error is not the best solution.
I fully agree with Sheeri
I fully agree with Sheeri here: That's a very elegant solution that really should be common practice. Why do something yourself that the database already does for you?
You even save one query to the database that would otherwise generate additional overhead as well, especially if you have separate application and database servers. This of course is especially important in a typical high-load MySQL setup with a master and some slaves, as all the INSERT statements (and for data integrity the SELECT to check, too) will have to be executed on the single master. You'll be happy if you can save a query there!
And (at least in the programming languages I use) I don't have to deal with this necessarily in an exception: I can just use the return value of the INSERT call in an if statement, as I would have to check for the outcome of the SELECT anyway.
Of course you can encapsulate the query in a stored procedure, but then I personally would just do it the same way as Sheeri did and the error would just be generated inside the stored procedure (because it's not yet possible to raise an error manually inside a stored routine with MySQL, although there exist some workarounds recently posted to planet mysql and the forums).
A response to D.
A response to D. Rickard:
The error is not generated in the programming language -- the error is MySQL generated. All you have is the overhead associated with checking out what (if any) was the MySQL error.
A stored procedure would have to be able to somehow give feedback on whether or not there was actually an insert. I don't believe stored procedures can return that kind of information; even if they could, you'd have to read the output, which is no less overhead than reading the MySQL error, if it was generated.
[...] So, in a previous post
[...] So, in a previous post I talked about smart code. Today I put myself square into a discussion about ENUM and SET. ENUM is an enumerated list of values; similar to a pull-down menu, where the only values allowed in that field are the ones defined, with the option of also having a null column. [...]
I disagree with your
I disagree with your approach for coding the insert. Depending on the programming language being used, there can be a lot of overhead for generating an error/exception. It's advisable to avoid errors when possible, and in this case it's just a simple SELECT on a connection that would be opened anyway in your approach that can prevent the error. Alternatively, a stored procedure could be used that first checks for the existence of the vote for the given user on the given day, and then only inserts if it doesn't exist.
Your good
Your good