Sunday, April 13, 2008

Save Windows XP

Maybe I'm just out of touch, but I just now heard about this petition from InfoWorld to save Windows XP.

http://weblog.infoworld.com/save-xp/
I have four PCs in active use in my house and all run Windows XP. While I could switch to Vista on the newest machine, and possibly the second-newest machine, the other two simply can't run Vista. I may replace the newest machine in the next year (and there's a cascade of machines when that happens), but I want the same OS (and version of Office) on all the machines, for a lot of different reasons, including network driver issues for my printer and networked disks, as well as the burden of troubleshooting problems on two different operating systems. If I can't get XP on the new machine, it means, essentially, that I have to buy three new machines instead of just one.

I'm not anti-Vista, but I will say that if Vista was smaller, faster, had fewer problems, was more compatible with Windows XP, and worked on more hardware, this wouldn't even be an issue.

Saturday, April 5, 2008

Real World Bottlenecks

Yesterday, I wrote about the problem at the Democratic caucuses and asked the question of what would be a good algorithm for tabulating a lot of votes quickly with a relative lack of technology.

First, an update on the parameters. 320 people voting for 19 delegates -- 9 men plus 9 women, plus 1 extra delegate of either gender (the candidate with the highest remaining vote count). Plus 9 alternates, in a similar manner. This is means we needed to tally more than 6,000 individual votes, split into two halves, for men and women. Voting was by candidate number.

What happened is that teams of people entered the ballots into very simple spreadsheets -- basically, just the candidate numbers. This was good because it was both simple and parallelizable. Like most developers, I type numbers very quickly. I was paired with another software guy (one person has to read, one has to tally) and we breezed through 85 ballots in 30 minutes (935 entries, an average of about an entry every 2 seconds). But the fact that we did 85 ballots is indicative of a problem -- there weren't enough laptops and teams, and other teams weren't as fast. Still, all the data entry was done in under an hour.

Then, something inexplicable happened. I don't actually know what. Given all the data in spreadsheets, it should be a simple matter to pull it all into one spreadsheet, perform a series of COUNTIF formulas for each of the potential candidate numbers and then sort the results by total. Even if that spreadsheet hadn't been created in advance, this is like a 5 minute operation. Instead, it took more than an hour to take all of the tallies into results. The people doing the totaling vanished into some other room, so I don't know what happened. Beforehand, I heard something about Microsoft Access being used, but I don't know why it would have been. Access isn't the best application to use when what you want to do is count data, especially when equivalent data has been entered in multiple columns.

Net result: it took more than two hours for the results of the vote to be known.

Everybody was well intended, but I think there were a number of factors contributing to the a less-than-desirable end result:

  • Unreasonable restrictions (initial statements that computers couldn't be used, or that no more than one computer could be used).
  • A single plan, for a situation that didn't happen (200 candidates), rather than multiple plans for different situations. The plan was inflexible.
  • Not enough parallelism.
  • A final step that was overly complicated.
And it's those last two that are the biggest problems -- no matter how much we optimize the system, if we have a bottleneck somewhere, all our optimizations will be for naught.

Of course, it would be much simpler if Washington used a primary, with systems that are already in-place to count votes, but that's a whole 'nother story.

Friday, April 4, 2008

An Exercise in Democracy

It's a crazy election year. Results actually matter in states besides Ohio and New Hampshire. In Washington, there are caucuses and then there are more caucuses. And, for Democrats, there's also a primary that doesn't count for anything.

Tomorrow (April 5th), we have the Legislative District caucuses to elect delegates for the next level. The delegates to the district caucus were chosen at the precinct caucuses. Since it's so exciting this year with two great candidates still in the running, there's an interesting problem -- almost everybody wants to be a delegate at the county convention, which is the next level (and presumably beyond that). In the caucus that I'll be in, there will be more than 300 delegates and the early estimate is that 200 of those delegates will be candidates for the county convention. We'll elect about 20 delegates, half of whom must be men and half of whom must be women. The vote is complicated. Roughly, here are the requirements:

  • If we're electing 10 delegates, each person votes for 10 delegates.
  • On the first ballot, we narrow the field down to 30 candidates (3x the number of delegates to be elected). I would guess that this was added this year because of the large number of interested candidates.
  • On the second ballot, we elect the actual delegates from the reduced field.
  • The ballots must be secret. No raised hands.
  • The ballots must be traceable. If there there is a dispute, (yeah, this seems to go against the secret ballot aspect).
  • Voting is by numbers, not names. Each candidates writes down the appropriate numbers for the candidates they wish to vote for. Not sure this is an actual requirement, but it's a given.
  • Each person can only vote for a given candidate once. For example, you can't vote for yourself 10 times.
  • Everybody votes for the same number of men and women. You can't use all your votes for just men or just women.
  • No complex technology -- there's not enough prep time to create anything, and no time for training people.
  • There is money available, if necessary, for printing, copying, etc.
The obvious problem is time. The gating factor is the number of voters and votes, not the number of candidates. Let's suppose you have 3,000 votes to count and it only takes 1 second to count each vote. That is 3,000 seconds or 50 minutes. For each ballot. That's a lot of waiting after the ballot to find out who won.

What algorithm would you propose?

At this point (Friday night), I know what's planned for tomorrow. It isn't what I suggested (because I was initially told that there computers couldn't be used, and the plan does involve some computers), but I think the plan is reasonable. There are risks, however. After the caucus, I'll report back on what actually happens and how it goes.