Classic Posts

Can you believe Overview has existed for more than five years already? Here are some classic posts from our archives (if you want to learn how to use Overview, see the help.)

Doing journalism

What did private security contractors do in Iraq? — The very first story done with Overview, an analysis of 4,500 pages of “escalation of force reports” for the Associated Press. See also how it was done.

Some other completed stories.

The different kinds of document driven stories — Sometimes you want to search, sometimes you want to categorize and count, sometimes you want to remove the junk.

The document mining Pulitzers. There were plenty of document-driven stories in the 2014 Pulitzers.


What do journalists do with documents? — A talk (video) and paper which reports on 15 different stories done with Overview, plus uses of other NLP techniques in journalism.

Algorithms are not Enough: lessons bringing computer science to journalism  — What we learned applying NLP techniques to journalism, but useful for anyone designing software.

VIDEO: Text analysis in transparency — A talk at Sunlight Labs, 2013. Old but good.

How Overview can organize thousands of documents for a reporter — how the Topic Tree works, in detail

What is xkcd about? Text mining a web comic — A comparison of Overview’s clustering vs. LDA topic modeling.

Who will bring AI to those who cannot pay? Some reflections in the barriers to applying advanced technology in journalism.

The development of Overview

VIDEO: What the Overview Project does — A presentation from a conference in Berlin, 2014.

Overview: The Design, Adoption, and Analysis of a Visual Document Mining Tool For Investigative Journalists. Paper accepted to IEEE InfoVis 2014 that describes the evolution of the system

VIDEO: Document mining with the Overview prototype — NICAR conference, March 2012. The prototype was so awkward, but people did good stories with it anyway.

How Overview turns documents into pictures — A discussion of the old prototype’s scatterplot and topic tree visualizations. These visualizations are no longer used, but this post has some interesting details on interpreting these types of text visualizations.

VIDEO: Investigating thousands (or millions) of documents by clustering — Demonstration of document clustering methods at NICAR conference, February 2011.

A full-text visualization of the Iraq war logs — December 2010.  This is where it all began, with a hacked-together proof of concept based on document clustering.