The Data Dump – Fun With Graphs & Charts

The Data Dump has two strict rules:

  • the data must make a larger point about the Internet and its users, not just about the source company
  • since data visualization is as or more important than data collection, it’s gotta look good.

State of the Blogosphere – March 2006, David Sifry

  • 30m blogs tracked, doubling in size every 5.5 months, consistent doubling for the previous 36 months.
  • 100’000 blogs created each day, almost every second.
  • 50% of bloggers still blogging after 3 months, 10% of blogs updated weekly, 9% are spam.
  • 60% of pings are from known spam sources – Technorati registers them as splogs.
  • 1.2m posts/day, 50’000 posts/hour.
  • Frequency should be measured in megahertz!
  • Blogging has brought a friction-free publication mechanism – trade magazines are being displaced by blogs.
  • 41% Japanese, 28% English, Chinese 14%, Spanish 3% – Japanese growth has come in the last four months.
  • 50% of blog posts use tags or categories.
  • 81m+ tagged posts, with 400’000 more each day.
  • It’s about exposing community and adding context.

Feeds, Eric Lunt

  • Feedburner measures feed traffic for 200’000 feeds.
  • ManifestDigital visualisation of feed activity as drops of water on a pond, colour indicates media type and splash-radius, the peak subscription figure.

Gauntlet Systems, Adam Messinger

  • Next-generation continuous source control system…no more broken builds or smoke tests….conprehensive analytics.
  • Do the few most productive developers account for the majority of code?
  • Does open source enable a long tail for porting and internationalisation?
  • Example – Two people do 80% of coding, the rest make changes…the distribution flows exponentially.
  • Lucene – Three developers do most of the work,
  • Hibernate – Development is evenly distributed in this commercial project

Windows Live, The Year In Review

  • Queries from Live Virtual Earth and Live Search were rendered as a a cloud of keywords.
  • ‘The diversity of activities is mind-boggling’…

O’Reilly Radar, Roger Magoulas

  • Normalisation, how to spot emerging trends and weak signals.
  • How do you compare things with vastly different scales?
  • Correlating AJAX pages and AJAX jobs showed that SF, NYC, Boston were the key hiring regions and that hiring trends were as seasonal as other languages.
  • Google book searches for O’Reilly titles are driven by searches originating in India (70%)., Jonas Goldstein

  • After founding the Attention Trust and a clickstream recorder that streams data to a personal online vault.
  • Stamen-produced statistical visualistions of browsing activity.

, August Capital, David Hornick

  • Previous six months of Hornick’s email.
  • Email = Mtgs + Schmooze.
  • 17’779 recieved.
  • Thanksgiving and Chanukah were only dips!
  • Holidays seem to be only dips, even when normalised across all VCs.
  • 90 references to Cabo, 159 Hawaii, 198 Wine.
  • Most common email – titled ‘Introduction’ (979 times).

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.