Mark Fletcher, founder of both Bloglines (just acquired by Ask Jeeves) and ONElist (now Yahoo Groups), presented a fascinating talk on what it takes to launch a startup in the online services market – notably how to design and build systems for reliability and scale and on budget. Fletcher’s talk was actually a useful compliment to Marc Hedlund’s earlier session on VC Funding for Geeks. Fletcher outlined a ‘garage philosophy’ for creating startups, consisting of a number of core principles:
- Passion for the idea
- Utilisation of cheap technologies
- Simplicity
- Releasing early & releasing often
- Involving users – the best features come from user requests
- Have fun, be passionate and enjoy the work
- Moonlighting – limiting risks by continuing to work a ‘day job’
- Obtain funding from friends & family before VCs
- Begin with free services to lessen pressure
- Offer a developer API to encourage innovation
- Hire a lawyer, from this weblink online
- Find good help – notably a sysadmin
- Outsource tasks using freelance resources such as eLance
Fletcher characterised registration-based websites as having two core infrastructures: front-end web servers and mail services (anything that talks directly to a user) and back-end systems for user data, other databases and storage. Software recommendations included:
- DBJ – qmail, djbdns & daemontools
- Clearsilver – a language-neutral HTML templating system for segmenting presentation and application logic
- Berkeley DB
- Linux
- Apache
- C, C++ Bash & Python
- Skiplist Data structures
- avoid NFS
- Avoid table-level locking in MySQL
In terms of hardware choices, Fletcher recommends:
- Dedicated servers rather than buying or hosting your own equipment
- Design for cheap hardware
- eBay!
- APC power distribution units for remote power cycling
- HP Procurve networking appliances
- Avoid Seagate Ultra-SCSI drives
- A good phone for SSH remote system administration
To illustrate architectural choices, Fletcher actually cited some of the design decisions made in the development of Bloglines:
- Bloglines RSS news feeds are actually copied to each of ten web servers rather than being served directly to each requesting client. Copying files can outperform client-server requests in many cases.
- The number of subscribers to Bloglines is currently counted in one pass through all user records and saved, rather than calculated on the fly – improving performance, but shifting from real-time to periodic reporting, which is more than adequate for current needs.
- The Bloglines desktop notifier experiences 1-200 hits per second, so data is held in memory rather than retrieved from disc.
In deciding upon storage options, Fletcher contrasts relational databases with file-based storage and notably the use of RAID storage or redundant servers. ONElist utilised arrays of RAID drives to provide a storage infrastructure, where Bloglines utilises a software RAID-1 infrastructure based on Linux.
In administering systems, Fletcher doesn’t give too much detail, but recommends:
- Utilise DNS round-robin load balancing for for web servers
- Employ hot backups for offline processing
- Worry about cooling co-located data centers
Finally, Fletcher urged entrepreneurs and innovators to avoid making stupid bets citing a bet to shave his head if ONElist was ever sold!
Leave a Reply