Bloglines Web Services
Tomorrow morning around 5am Pacific Time, a press release with the title New Bloglines Web Services Selected by FeedDemon, NetNewsWire and Blogbot to Eliminate RSS Bandwidth Bottleneck will go out, but I’m so excited I’m going to blog about it now. So what is this? First, continuing a tradition we started with the notifiers, we’re augmenting the data that you can pull out of Bloglines programmatically. We’re calling the new functions the Bloglines Web Services and we’ve launched a whole new part of the web site to document them. The new functionality lets a program pull Bloglines subscription data as well as blog entries, using the OPML and RSS formats. So what does this mean? If you’re a desktop aggregator developer, you can use Bloglines to provide a sync’ing capability for your users. And you don’t have to worry about supporting the different RSS and Atom formats (and various imperfect feeds), because Bloglines normalizes all data. If you’re a publisher thinking about entering the world of RSS, you don’t have to worry about thousands of desktop aggregators pummeling your servers into oblivion. With the Bloglines Web Services, Bloglines acts as a feed cache, insulating content providers from bandwidth problems. What makes this announcement extra special, of course, is that the leading desktop aggregators are announcing support for the Bloglines Web Services. FeedDemon has a beta version available now with support built-in, and NetNewsWire and BlogBot will be launching new versions soon. We’ve been working on this for awhile now, and I’ve gotta say that Nick, Brent and Dru are great people to work with. I’ll have more to say later, but I wanted to be “First Post” with the news.
Seagate Has A Problem
At Bloglines, we have 3 classes of machines in our cluster. We’ve got web boxes, which are pretty lightweight. We’ve got storage class machines, which as you can guess have big drives and medium speed processors. And we have database class machines, which have fast processors, fast disk, and lots of ECC memory. Fast disk, in general, means some form of SCSI. The database machines use Ultra SCSI drives, specifically Seagate Cheetah Ultra320s in a RAID configuration. Unfortunately, we’ve experienced something like a 40% failure rate on these drives. Because of the RAIDs, this hasn’t resulted in any loss of data or downtime, but it’s still extremely unacceptable. The drives have a 5 year warranty, so we’ve been shipping them back to Seagate. In return, we receive ‘repaired’ drives from Seagate. Recently, one of those repaired drives failed within one minute when installed in a machine. My suspicion is that part of the problem is that Seagate isn’t doing much of a job to fix drives that are sent back for repair. Speaking of which, when sending a drive back to Seagate for replacement, you can call them up and ask for the ‘advance replacement option’. This means that they send out a ’new’ drive before they receive your old drive. This speeds up the replacement process. Before today, we were able to get a customer support rep on the phone directly and specify the advance replacement option immediately. But now, apparently Seagate is outsourcing their first-tier customer support, so now when you call them up, they ask for your details and then say someone will be in touch within 24 hours. Which, if calling on a Friday, probably means Monday. We’ll never purchase Seagate Ultra SCSI drives again. The risk is too high.
Bloglines Updates
We pushed out a couple of cool new features last night on Bloglines. First is ‘Keep New’, which lets you mark individual blog entries as unread. The second is ‘Related Feeds’, which are a list of feeds that are similar to the feed you’re reading. This compliments the Bloglines Recommendations, which are personalized for each user. Also, there’s a great article on us in the San Jose Mercury News today (Yahoo link because the Merc changes URLs and puts things behind registration after a day).
Foo Camp 2004
I just got back from Foo Camp up in Sebastopol. I had a blast, and want to thank Tim O’Reilly and Marc Hedlund for the invite. The intellectual firepower there was amazing, and everyone was really friendly and open. Here are some pictures of the weekend by Mark Frauenfelder. I always love meeting Bloglines users and getting their feedback, and I was pleased to find that many of the attendees were indeed already Bloglines users. Unlike many conferences, we actually got stuff done. Two things, in fact. First is the Feed Mesh idea, which is basically a federation of large RSS companies sharing blog update pings. The idea is to reduce the need for someone like Bloglines to poll a given feed on a regular basis. If we can reliably get notifications of when a given feed has been updated, we won’t need to poll it constantly. This will reduce the server/bandwidth load on feed providers. The second idea that we came up with is the Vary ETag proposal. When an aggregator requests a feed, generally either the entire feed (usually 15 items) is returned, or nothing is returned (if the feed hasn’t changed). Generally, one item is added or updated in a feed, so when all 15 are returned, it’s a waste of bandwidth. This proposal introduces a way for servers to implement a change that would allow them to only serve up the new or updated items. And here’s the most important part of this proposal: the servers can do this with no changes required on the client side. If Blogger were to implement this, for example, they’d see an immediate savings in bandwidth. Same with any of the other large blog hosting sites. There have been other proposals, but they all require changes on the client side, which I consider to be a non-starter. Later, there was a session on designing the next-generation feed reader. Of course feed readers haven’t been around long enough to start talking about a ’next-generation’, but several good ideas came out of this session as well. The session ended up being a bit Bloglines-specific, but I’m certainly not complaining. :) Finally, this was the first time I’ve slept in a cubical since probably 1998. But this time it was for fun and not for work. And this time I had a comfortable inflatable bed. What a fantastic weekend.
RSS Bandwidth Issues
The other day, on Scoble’s blog, he announced that MSDN was having problems keeping up with the bandwidth demands of RSS aggregators. Well, if Microsoft can’t handle it, then it’s definitely a problem, right? Many people have chimed in about solutions, mostly involving existing HTTP standards and reducing the size of the feeds served. These are all good ideas. I don’t have time to get into a lot of the technical stuff right now, but one really good recommendation is put forth by FeedDemon’s Nick Bradbury. I’ve recommended the exact same thing to a few people. Another resource that’s being developed is Sam Ruby’s HTTP Best Practices. He’s applying his usual exceptional thoroughness to documenting the issues (I will always sing Sam’s praises, as his and Mark Pilgrim’s work on syndication test cases has been a tremendous asset to the community). Unfortunately I don’t have time right now to talk more about this. But these resources should help sites that are having issues dealing with the load from aggregators.