Feb 27, 2007
As a follow-up to the first visualizations we made of user activity on Digg (posted to the digg blog in 2006), we've widened the scope of our visualizations to show an entire day's worth of digging activity on the site in greater detail. The resulting images, made by Tom Carden, illustrate some general patterns, and one controversial story immediately becomes visible; more about this below.
Every digg from yesterday, February 26th, 2007 is shown in the following sets of scatter-plots.
Diggs by Time
The time of day that a digg occurred is represented horizontally, midnight to midnight, Pacific Standard Time. The vertical axis indicates story ID, with the newest at the top, and the oldest at the bottom.
The first scatter-plot shows the activity across all Digg stories for one day. Diggs for promoted stories extend back far into the past, with previously-popular stories continuing to receive attention days or weeks after their initial promotion. Clearly the vast majority of activity is focused on the newest stories though, so it's hard to read much from it.
All Diggs, Zoomed In a Bit:
The images below are zoomed-in to show the most recent 75,000 and 10,000 stories, and to make things clearer each digg is colored according to the user ID. Red is people who've been on the site for a while, blue is people who've signed up recently, and the colors move through the rainbow to indicate time they've been Digg users.
Things to note:
- The smooth arc across the top is the advancing front of new story submissions throughout the day, and its slope shows that submissions are slow during the early morning hours, rapid throughout the day, and slow again in the evening.
- The vertical streaks of color indicate particularly busy users, digging large numbers stories in sequence over short periods of time. There are also smaller streaks near the top which correspond to a more common pattern: users digging large numbers of stories in reverse chronological order starting with the latest submission. This pattern would match rote digging of the upcoming pages, top to bottom, page after page.
- Finally, the horizontal bands represent extremely popular stories, with the left end of the band matching the front page promotion time. These stories are like meteor trails, hitting hard and then slowly fading to the right as the day progresses and new ones take their place on the front page.
Promoted Stories Only:
In the graph below, only diggs for already-promoted stories are included, clearly showing the massive snowball effect of front page promotion these stories receive. The yellow color of these stories is a result of the mashing together of all of the different colors on top of one another. This is a good sign; it shows that popular stories are almost universally dugg by a large cross-section of users. A horizontal stripe that's all one color (there's a blue one indicated below, which means it was mainly dugg by new users to the site) is definitely unusual.
That Thin Blue Line:
If we take a closer look at the blue story marked with an arrow above, the new-user bias becomes even more pronounced. The story in question, "BBC Reported Building 7 Had Collapsed 20 Minutes Before It Fell", was submitted right about noon PST, and immediately started getting a ton of traffic. The image below shows all the stories which were submitted right around the same time and also made it to the front page of Digg. Titles are colored according to how long the person who submitted the story has been a Digg user. Almost every digg on the story is bright blue; a smattering of older users is visible, but its color profile is markedly different from the yellowish tint that most other promoted stories have in the image above.
Joining After the Fact?
Stripping out every story except that one, we can start to focus in on some interesting details about the story, specifically: how long have the people who're digging this story been members of Digg? If we keep the horizontal axis (time of a digg) as it is, and the color of each dot (length of membership in Digg) as is, but change the vertical axis to the time after the story was submitted that a person joined the site, a telling picture emerges:
In the first 12 hours of the story being on Digg, 164 out of the 829—just under 20%—Diggs on the story came from people who had joined Digg after the story was submitted, in a pretty regular pattern. It looks like the post, which contained "Click here to add your own comment and counter the debunkers," brought a large number of people to the site specifically to digg this story.
The story was quite controversial, with numerous people in the comments claiming that Digg had censored the story. The visualization above shows that a large number of the diggs the story received came from very new users without high Digg reputation scores, giving it lower overall reliability than the vast majority of stories (check the graph) which are promoted because of the activities of a wide range of users.
Non-Promoted Stories Only:
In the last scatter plot of this kind, only diggs for upcoming stories are included. The scarring effect, where users digg one story after another in sequential order, seems to take place almost exclusively with very recently submitted stories, and seems to be about evenly distributed between long-time and new members of Digg.
Diggs by User: User ID
The last scatter-plots look at two different ways of representing the activity of the Digg userbase. In the images below, users are arranged vertically by user ID, with the newest at the top. The time of day that a digg occurred in Pacific time is arranged horizontally, midnight to midnight.
The first pattern that's immediately obvious is that there's not much bias towards the top or bottom - Digg's earliest users still account for a substantial portion of the activity, along with the new accounts. On the horizontal axis, activity is very roughly centered around the middle of the graph, which would of course fit with the sleeping patterns of a largely U.S. user-base.
Diggs by User: User Registration Date
The chart below is more interesting, showing the same data on the horizontal axis, but replacing user ID with user registration date vertically. The oldest users (late 2004) are at the bottom, the newest at the top.
There are two striking features of this graph, when compared to the previous one. First, the Digg v3 launch from Summer 2006 brought a lot of new users to the site, many of whom are still quite active. Second, there's a sharp increase in the number of user registrations near Autumn 2005, a time when Digg first started generating alot of attention. The roughly uniform texture throughout late 2005 and all of 2006 shows that Digg is keeping its early users interested, and the v3 spike shows that the new users attracted at that time have also found a reason to stick around.