The Data Dump – Fun With Graphs & Charts
The Data Dump has two strict rules:
- the data must make a larger point about the Internet and its users, not just about the source company
- since data visualization is as or more important than data collection, it’s gotta look good.
State of the Blogosphere – March 2006, David Sifry
- 30m blogs tracked, doubling in size every 5.5 months, consistent doubling for the previous 36 months.
- 100’000 blogs created each day, almost every second.
- 50% of bloggers still blogging after 3 months, 10% of blogs updated weekly, 9% are spam.
- 60% of pings are from known spam sources – Technorati registers them as splogs.
- 1.2m posts/day, 50’000 posts/hour.
- Frequency should be measured in megahertz!
- Blogging has brought a friction-free publication mechanism – trade magazines are being displaced by blogs.
- 41% Japanese, 28% English, Chinese 14%, Spanish 3% – Japanese growth has come in the last four months.
- 50% of blog posts use tags or categories.
- 81m+ tagged posts, with 400’000 more each day.
- It’s about exposing community and adding context.
Feeds, Eric Lunt
- Feedburner measures feed traffic for 200’000 feeds.
- ManifestDigital visualisation of feed activity as drops of water on a pond, colour indicates media type and splash-radius, the peak subscription figure.
- Next-generation continuous source control system…no more broken builds or smoke tests….conprehensive analytics.
- Do the few most productive developers account for the majority of code?
- Does open source enable a long tail for porting and internationalisation?
- Example – Two people do 80% of coding, the rest make changes…the distribution flows exponentially.
- Lucene – Three developers do most of the work,
- Hibernate – Development is evenly distributed in this commercial project
Windows Live, The Year In Review
- Queries from Live Virtual Earth and Live Search were rendered as a a cloud of keywords.
- ‘The diversity of activities is mind-boggling’…
- Normalisation, how to spot emerging trends and weak signals.
- How do you compare things with vastly different scales?
- Correlating AJAX pages and AJAX jobs showed that SF, NYC, Boston were the key hiring regions and that hiring trends were as seasonal as other languages.
- Google book searches for O’Reilly titles are driven by searches originating in India (70%).
Root.net, Jonas Goldstein
- After founding the Attention Trust and a clickstream recorder that streams data to a personal online vault.
- Stamen-produced statistical visualistions of browsing activity.
, August Capital, David Hornick
- Previous six months of Hornick’s email.
- Email = Mtgs + Schmooze.
- 17’779 recieved.
- Thanksgiving and Chanukah were only dips!
- Holidays seem to be only dips, even when normalised across all VCs.
- 90 references to Cabo, 159 Hawaii, 198 Wine.
- Most common email – titled ‘Introduction’ (979 times).