Cascade is the latest project by NYTimes R&D department that allows precise analysis of the structures that underly sharing activity on the web. Initiated by Mark Hansen and working with Jer Thorp and Jake Porway (Data Scientist at the Times) the team spent the last 6 months building the tool to understand how information propagates through the social media space. While initially applied to New York Times stories and information, the tool and its underlying logic may be applied to any publisher or brand interested in understanding how its messages are shared.
The app is primarily an exploratory tool, Jer explains. NYTimes publishes more than 6,000 pieces of content every month, and the team can now analyse every sharing event involving this content using Cascade. Jer describes the basic app workflow:
- A ‘Story Mode’ which shows a set of stories, and their associated event cascades. These stories can be requested via keyword search, section search, or a variety of ‘interestingness’ metrics. This view has some low-level visualizations of activity over time which allow us to focus in on event cascades which might be particularly interesting.
- A ‘Cascade Mode’ which allows us to view the event cascades. The cascades build over time – one of the things we’ve been most interested in with this tool has the time-based analysis. Rather than seeing static views of the social graph, we can actually see the sharing networks unfold over time. This mode has three distinct views in which each cascade can be examined:
1) A ‘side view’ which shows all of the events over time, and uses the Y axis to indicate degrees of separation from the originating event
2) A ‘radar view’ which views the system from overhead and lets users identify ‘threads’ of conversation
3) A 3D ‘tree view’ which combines views 1 and 2
The tool is built in Processing, with a lot of help from Andres Colubri’s GLGraphics library and toxiclibs. It runs on any machine, but is staged on a 5-screen video wall. This ‘exhibition’ app runs in an automatic mode, in which it explores the terrain of available data and wanders through the various presentation modes. The wall can also be controlled by a custom iPhone app which is a fairly simple and sends OSC commands to the display system. The team considered using touch or gestural input to control the display but in the end this gave them the control they wanted while being able to use the interface at some distance from the screens.
All of the data is stored in a Mongo database, which they access through a Python API. They also used R quite a lot during the exploratory phases. The largest cascades they are currently loading have about 25,000 events. These are all rendered in 3D at full framerate (60fps) across 5 screens (6400×720) by a single machine. Jer suspects the system could handle trees of up to 50,000 events (all thanks to Andres & GLGraphics). The data that the team are currently using is a 2-week sample from July/August, but Jer says they will be moving to a near real-time data feed very soon.
The implementation used right now looks at the sharing of NYTimes content over Twitter but Jer explains that in fact Cascade is a system that could be used to model any kind of sharing activity. They’re already looking at implementing it for other Times properties (boston.com, etc. ) and will be testing it out on other sharing systems over the coming months.
If you would like to know more about the project, make sure you also check out Coverage on Project Cascade from Nieman Journalism Lab. Of course, there is also the Project Page at NYTLabs.
Jer will be presenting latest work including Cascades at Resonate, new digital arts festival taking place later this year in Belgrade. Other confirmed speakers are available here + sign up to the newsletter for more info available soon. You can also follow on Twitter or join the group on Facebook.