Overview helps you make make sense of big disorganized sets of documents. It’s a visualization and analysis tool designed for sets of documents, from dozens to millions of pages of material.

Overview includes built-in OCR, a sophisticated search engineword clouds, entity detection, and topic-based document clustering. It has sophisticated tagging and metadata support and supports many input and export formats. If you need custom analysis, you can write your own plugins using the API.

It is open source and you can use the public server at overviewdocs.com or run it on your own computer. Overview Services Inc. provides paid support, custom feature development, and enterprise licensing — contact us for details.

Overview is designed specifically for text documents where the interesting content is all in narrative form — that is, plain English (or other languages) as opposed to a table of numbers.  It was built for investigative journalists, but it’s also used by lawyers, researchers, and analysts. I has been used to analyze emails, declassified document dumps, material from Wikileaks releases, social media posts, online comments, and more.

For more about the different ways to use Overview, see our post on the different types of document-driven stories.

Overview began at The Associated Press, supported by the John S. and James L. Knight Foundation as part of its Knight News Challenge. It has received further support from Google Ideas, Columbia University, and the ACLU.

Research and design work began in November 2010, moving through a proof-of-concept to a working prototype to an easy-to-use web application. See the FAQ for more.