Sunday, October 5, 2008

30 Days to Launch

I'm shipping a brand new consumer web service in 30 days. I better get started.

Flashback

A little over a month ago, I started building Puzzazz, a new puzzle site that I launched last Monday. For those who haven't been reading, I built it from start to finish, in my spare time, in less than a month -- using a language, a framework, and a platform that I'd never used before. It was fun and challenging. During my extra spare time, I started thinking about taking on a much bigger challenge, and I'm starting that challenge today, full time, with a plan to ship in 30 days.
 
Today, you can find "software as a service" -- applications supplied as services on the internet -- all over the place, from Google to Microsoft to thousands of small companies. But I'm not new to building consumer software as a service. I built my first consumer web service using AJAX in 1997, before most of the world had heard of software services and before the name AJAX was even coined. Sadly, that product was way before its time in many ways and the company didn't make it. One of the things I worked on at Microsoft was the bCentral Communications Center service. And, my last two companies, DreamBox Learning and Sampa, are also consumer web service companies, for educational software and family web sites, respectively.

Fast Forward

This time, I'm going to build a consumer web service that I've wanted to build for a long time. I'll be giving more details soon, but, for now, I'll just say that it's a new take on a simple idea. Like almost every other idea you can come up with, you could say it's been done before. But I have some new ideas. What I hope to do is combine the right set of features and usability so that I end up with, as Goldilocks would say, something that's just right.

30 days is either an eternity or no time at all, depending on how you look at it. I have enough time to build something that's pretty useful. And I have little enough time that I won't be able to throw in the kitchen sink.

Starting today, I'm going to try to blog about what's going on every day.

And Then What?

Well, I'll ship it. And I'll see what happens. I hope you find it useful.

After that, I'll be available for consulting gigs or maybe even the right "real job".

Monday, September 29, 2008

Launching Puzzazz

A month ago, I attended a Hackathon for Google App Engine at Google in Kirkland. I came away pretty impressed. During the Hackathon, I started a brand new project, Puzzazz. I literally started from scratch. I had to learn Python, Django, App Engine, and a few other things. I built a puzzle tracking system, a mini social network, and a bunch of other things. I even wrote some new Django middleware in the process that I'll probably release as open source.

After the Hackathon, I stole time in the evenings and weekends to finish the project, with the result being that I launched Puzzazz this morning at 12:01 AM Pacific Time.


I'm still impressed with App Engine. Sure, there are some limitations, some significant. Some limitations are because it's a preview release and some are fundamental design decisions related to the highly scalable architecture. Others have written plenty about the limitations, so I don't see the point of rehashing that here. But, as a result, App Engine isn't for everyone or every app. However, for those that fit the model, it's a powerful, game-changing tool.

I like App Engine so much that I'm going to use it for a very intensive project that I'll be starting in a few days. I'm not ready to share the details quite yet, but I will be blogging about it every day once I get started.

If you're wondering what exactly Puzzazz is, here's a brief summary from the About page:
Puzzazz has a simple mission: to give people who like puzzles a quick, fun diversion every day. Whether you're a casual puzzler or a serious puzzle addict, Puzzazz can fill the bill.

As a bonus, Puzzazz users can invite their friends to join in for some friendly competition. If you're a Twitter user, you can even have Puzzazz automatically tweet for you whenever you solve a puzzle.
If you like puzzles and, particularly, word puzzles, check it out. If you have any feedback, I'm happy to hear it. More details can be found in this post on my thisTangent blog.

Tuesday, September 16, 2008

The Six Hour Startup

On Sunday, I joined a bunch of people at a bar in Seattle to build a new web site from start to finish in six hours -- the Six Hour Startup.

One premise of the Six Hour startup (or at least this one) is that the project is decided on at the beginning of the six hours. Before I went, knowing very little and just having a link to a previous result, LunchLuck, I thought about what might a good project. One thing that I concluded was that data was an issue -- a good quick project needs a minimal amount of data.

After some discussion at the start, we settled on creating a web site to help people do the "one hundred push up challenge." The idea is that you spend six weeks doing push ups three times a week, working up to doing one hundred in a row. On the surface, it seemed pretty simple. You sign up, the site tracks your progress, maybe draws a graph or shows how you're doing relative to your friends.

Unfortunately, the data turned out to be an Achilles heel. The data itself was easy enough to get. But, the data imposed all sorts of requirements on the system that were more complicated than we thought. You see, the hundred push up challenge isn't simply about tracking how many push ups you're doing. You're trying to build up strength, so each day that you exercise, you're supposed to do specific sets of push ups -- between five and nine sets of push ups, with each set being a different number of push ups. And which set of sets you're supposed to be doing depends on how well you did last week.

Three hours in, we were still learning this. The schema, the user interface plan, pretty much everything, was based on assumptions that were wrong. That meant compromises. Had we all known the true scope of the requirements, we probably would have picked a different project. After all, six hours isn't a lot of time.

At the end of the six hours, I, and a number of others, didn't get to see working code (and it's not live now), so, in some sense, we failed. And the compromises meant that we failed in another way. Compare that with my experience at the Google App Engine Hackathon where, by myself, I built a working web site from scratch in about the same time, using technologies and a language I'd never used before. What was the biggest difference? Good scoping.

The moral of the story: What seemed like a very simple thing turned out to be much more complicated than we thought. The "elevator pitch" for the project was a lot simpler than the actual requirements. It seems like that's the case a lot of the time, even when we have a lot more than six hours.

So, was this a negative experience? Not at all. Not everything we do can be a success. It served as a good reminder on scoping, the people were interesting, and I learned some things. I'd do it again.

Sunday, August 31, 2008

Memcache in Reusable Code

If you're writing code for Google App Engine, they highly recommend using Memcache. Here's the deal: when they're measuring what they're going to charge you for, they include CPU and bandwidth. Memory is free. Using memcache to avoid database accesses is a very easy way to increase the number of pages you can serve within your quota. This is especially true if you have complex queries that need to be done for every requested page. Of course, outside of App Engine, memcache is useful for the same reason that Google recommends it.

Memcache has a very simple API. It takes a single key for each cached object. In an application which uses memcache in different places, this means that you must make sure that keys are globally unique. In my App Engine application, I've created a module that I'm planning to release, so that others can use it in their applications (more on that later). But how do I ensure that my keys are unique and won't collide with any application that it gets used in?

The solution: I adopted a very simple key naming convention throughout my application, which I also used in the module. I searched and could find no standard key naming convention for memcache, so I hereby propose this convention as a standard.

All memcache keys consist of a prefix followed by a context-specific unique value. The prefix depends on what's being cached, as follows:

CaseFormatUnique value
Model instanceModel.field=uniqueField value
Class instanceClassName.attribute=uniqueAttribute value
Class dataClassName.meaning:uniqueMeaning-specific value

Notice the use of = for instances and : for class data. Why is this needed? Well, suppose inside a class you decide to cache some data that isn't the instance of the class, such as something that takes a long time to calculate. Now, suppose somebody else decides to cache instances of your class. You might have a key conflict. By only permitting the : prefixes within a class definition, such name conflicts are avoided.

If you're caching an instance of a model or a non-model class, use =. If you're caching something else within a class, use :.

If what you're caching has a single, obvious key, then you can omit the ".field" portion of the key without problems (for example, Model=key). In practice, I think most uses can use this short form, but I'm defining the longer forms now to avoid confusion where they are needed.

I also added this to the App Engine Cookbook.

Saturday, August 30, 2008

Invent Your Own Gadget

The folks over at Bug Labs just sponsored a contest at and after the latest Gnomedex. The question was: what gadget would you make with the BUG platform, a hardware platform of interchangeable modules?


I haven't read a lot about the BUG platform, but I like the idea. Simple, interchangeable modules with (I hope) simple protocols that allow you to quickly prototype a wide variety of hardware gadgets. The current modules provide GPS, motion sensor, accelerometer, camera, and display capabilities. More are on the way (and, in the contest, they said you could postulate future modules).

I posted three gadget ideas -- one that was pretty silly and two that I actually like. If I win the contest, I think I might try to build the last one. Here they are:

Ultimate Remote Control
Where is that remote control anyway? With the Ultimate Remote Control, you don't have to worry about it -- just wave at your TV. The Motion sensor detects when you're waving and activates the remote control. From there, simple hand signals read and recognized through the camera control the TV. Raise or lower your flat hand to change the volume. Hold out your palm to mute. Move your fist to the right or left to change channels or fast forward and rewind on your DVD. Form numbers with both hands to enter a channel code, or use sign language letters to pick a channel by name. Status and confirmations are shown on the LCD. Because there's no physical remote control, everybody in the room can use it any time.
Why I Want It: We have lots of remote controls and they always seem to be lost. If they're not lost, they get fought over.

Help Me Drive
Help Me Drive uses the new radio module to tune in to broadcasted traffic information and combines that with GPS information to alert me of traffic conditions that I care about -- and only those -- via the new text-to-speech module. I don't need to know about an accident 30 miles away unless I'm heading in that direction. Help Me Drive uses a combination of GPS tracking and recording of prior routes to predict which possible traffic issues are a concern. For example, if I'm on 520 heading West between Redmond and 405 and a significant percentage of the time that I'm on that road, I continue past 405 into downtown Seattle, then traffic between me and downtown Seattle is highly relevant. As a bonus, Help Me Drive downloads construction information via WiFi whenever I'm near a preferred or public WiFi network. Plus, by recording traffic information over time, and correlating those with my driving patterns (it knows my speed at every point on my route), it can provide a better predictor of how bad traffic will get on my regular routes. I can also specify traffic that I'm always interested in or that I'm always interested in when I leave a particular destination, like work or home, and I can specify that I'm always interested in traffic between my current location and home or work, depending on the time of day.
Why I Want It: I hate getting stuck in traffic when I could have taken a different route that wasn't congested.

Slow Window
There's a famous science fiction story by Bob Shaw, The Light of Other Days, that introduces the concept of Slow Glass -- glass through which light moves very slowly, possibly taking decades to go from one side to the other. It's an awesome story and highly recommended. My slow window wouldn't be quite so slow, but the camera constantly records and plays back on the screen at a specified delay -- possibly 15 seconds, possibly an hour. 24 hours happens to be a really interesting delay. See what happened yesterday. An optional accelerometer can cause the time dilation to change on-the-fly as you move it around, a pretty cool effect giving you that "special relativity" feel. And, an optional motion sensor can make the slow window delay variable -- it's always showing interesting stuff, looping as necessary.
Why I Want It: Concepts from SF have come to life in so many ways already. I've always wanted slow glass.

Thursday, August 28, 2008

I'm Impressed with Google App Engine

I had a great opportunity today to attend a Google "Hackathon" led by a few members of the Google App Engine team. I decided to go after attending a talk by Mano Marks at StartPad on Tuesday. In short, App Engine is a server infrastructure and SDK for building and deploying highly-scalable web applications. The goal is that App Engine apps are easy to build, maintain, and scale, and are cheap to run. App Engine competes with Amazon's EC2, S3, and SimpleDB, as well as lots of more complex solutions. Before Tuesday, I'd read some articles on App Engine and I'd signed up for an account, but I really knew little about it. But, the talk was pretty compelling, so I decided to jump on the opportunity to attend the Hackathon and learn more.

 
Walking away, I'm convinced that Google is on to something.

Let me set the stage. Google App Engine supplies a Python framework for developing web apps and all development must be in Python. Before this morning, I had never written a line of Python code. I hadn't even read any before Tuesday. App Engine provides the Django framework for web pages. All I knew about Django was its name. Other than the talk I'd been to on Tuesday, I knew nothing about the App Engine SDK. But Mano had certainly made it look easy.

I didn't go in completely unprepared -- I'm not crazy, after all. Last night, I spent an hour and a half reading the first few chapters of of Dive Into Python. I also bought an O'Reilly Python Pocket Reference that I have yet to open. And, I installed the SDK, though I forgot to install Eclipse with PyDev (the preferred editor for Windows), so I had to do that this morning.

I have a little web app I've been meaning to write for a few years, so I thought that would be the perfect thing to try to create in App Engine. I thought maybe I could actually do the whole thing in a day. That turned out to be aiming a little high. I got hung up on stupid Python bugs, some App Engine gotchas that seem obvious now that I know them, and I hadn't counted on the time it took to download, install, and setup Eclipse. If I had already known Python, I think those things wouldn't have bit me and/or I would have worked through them quicker. Maybe I still couldn't have done my whole app in a day, but I would have come a lot closer.

But -- and this is the amazing thing -- I did get the core part working, and I demoed it to the group at the end. My site actually could have gone live. Sure, it's missing some critical features (like keeping track of users) and it's pretty ugly right now, but it worked. Zero to Sixty in Eight hours. Having App Engine team members around to ask questions certainly helped. And the demos at the end showed me just how much some of the people who already knew what they were doing were able to accomplish in just a few hours.

App Engine is not perfect. It's still a work in progress and there are a few silly things. I got JavaScript errors on some admin pages on their live site. And debugging Python this way is a pain. I don't know what the state of the art on Python debuggers is yet, but I'm guessing it's not anywhere near what I'd like to see (my comparison is the Visual Studio debugger, which rocks). And, certainly, it's not appropriate for everything. Services like Sampa can't be built on it. And, it's not going to work if you have very particular database needs. But, it'll work for a lot of things. And it seems the team is interested in listening and it looks like they've already made changes and enhancements in response to feedback.

We've always talked about leveraging existing code, not repeating ourselves, and sticking to our own areas of expertise. And it's always been hard. Google is really delivering a way for us to do just that in a new and exciting way.

If you're interested in finding out when my app goes live, go to www.puzzazz.com and sign up for the mailing list.

Wednesday, May 21, 2008

Why Is Mozy So Slow?

I have good backup habits. At least I do now.

I hate to admit it, but it took losing not one, but two, hard disks (one to a hardware failure, the other to a particularly virulent virus with a rootkit) and some critical data to get me there. But I'm all better now. Really.

I have a USB hard disk that I do full backups to and I use Mozy (the paid version) for offsite backups of critical stuff. I used to use Carbonite but felt it was slow and it lacked configuration options that I wanted. And Mozy works great for my wife, whose total backup is less than 2GB. But, for me, Mozy slows my machine down to a crawl whenever a backup starts.

Now, I know I have a big backup. It's currently 157,070 files for a total of 84GB. And that isn't even everything. I have hundreds of gigabytes of photos that aren't in my Mozy backup set. Most of what I have backed up isn't changing on a regular basis.

How slow is it?

Here are some examples.

Opening the Mozy Configuration window (just opening it!) takes 20 minutes. Note that you have to open the Configuration window to change any settings or get status information on previous backups.

The backup itself takes hours to days. A small backup, 290 files totaling 75MB, took 4 hours. Of this time, almost all of the time was spent scanning my hard disk and compressing files for backup. The actual transfer took only 3 minutes.

A backup of 1000 digital photos totaling 5GB took took more than 4 days (that's with the machine on, nonstop). Of that time, 47 hours were prep and only 8 hours were transfer. What was it doing for 47 hours?!

To make matters worse, any interruption of the backup (like hibernating my machine or losing a network connection) looks like it restarts the whole thing.
Where's the slowdown?

Let's look at the transfer speed first. 5GB over 8 hours is only 625MB per hour, which is about 10% of my upstream capacity. So, it's obviously not network constrained.

Could it be CPU or disk constrained? It's hard to separate the two without a deeper analysis that I can't do easily, but I have a good way to look at the overall picture. For my USB backups, I use SyncBackSE from 2BrightSparks, which I like very much -- unlike most so-called sync products, SyncBackSE actually knows how to do a true sync operation, including dealing with conflicts. And it's great for backups. A backup of almost 3,000 files totaling 6GB took just 34 minutes from start to finish, including 16 minutes scanning 460,000 files (3x times the number in my Mozy backup set) and 18 minutes copying.

Mozy does seem to be doing way too much I/O. Looking in Task Manager, it looks like SyncBackSE reads and writes very little more (maybe 10% more) than the data being backed up. That's about what I would expect. But Mozy's two processes (mozybackup.exe and mozystat.exe) read more than 15 bytes for every byte being backed up and write about 2 bytes for every byte being backed up, and that's excluding the network I/O. And it also looks like they are writing tiny packets to their server (an average of 8 bytes per Other I/O request). This certainly doesn't seem right to me.

Except for the backup that took 4 days, all of the Mozy backups occurred in the middle of the night while I wasn't using the machine, so there was no contention on the machine. Mozy does have an option for how much CPU it will consume. But, it doesn't seem to make much of a difference when I told Mozy it could have as much CPU as it wanted, either in how fast the backup happens or in the performance of my machine (it makes it crawl, either way). Remember, it takes 20 minutes of waiting for the Configuration window to open in order to change that option. To be fair, SyncBackSE also slows the machine down dramatically when it's running, but it's over a much shorter period of time.

I'm not expecting Mozy to be as fast as a local backup, but it's more than 100 times slower and there doesn't appear to be any reason for it. Mozy's scanning is 600 times(!) slower than SyncBackSE when it ought to be just as fast. I would guess, from what I can see, that Mozy stores the backup set database remotely instead of locally (why?), but that still shouldn't account for a 600x difference. Maybe Mozy needs to buy 2BrightSparks and use whatever it is they've done.

What's the answer?

I wish this blog post ended with an answer instead of a question, but I don't know the answer. Maybe the people at Mozy do. Maybe they'll enlighten us.

What I do know is that I have to look elsewhere. A full backup for me is already hundreds of gigabytes and growing fast. I use a 16GB card in my camera and, yes, I fill it sometimes. Add to that the huge, high-res Photoshop files that I create when I'm working on one of my montages.

I'm planning to look at FolderShare, BeInSync, and LogMeIn Backup. I have to say I don't really like the idea of paying a subscription fee to backup to my own hard disk, which I also have to find a remote server to put it on. Suggestions welcome.