How to Use Overview to Explore A Document Set

It takes just a few minutes to start exploring your documents in Overview. Overview depends on DocumentCloud to store, OCR, and publish documents, so you will need a DocumentCloud account (here’s how to get one.)

1. Batch upload your documents to a DocumentCloud project
Log in to your DocumentCloud account Create a project to store all of your files, using the “New Project” button. Then select “New Documents.” Now here’s the trick to batch uploads: when the file dialog box opens, you can select all of the documents in a folder simultaneously by clicking on the first, then shift-clicking on the last (or pressing Control-A on Windows, or Command-A on Mac). You can keep the documents private if you like.

2. Log into Overview and import your project
Go to overviewproject.org and log in, or create an account. Select “import your project from DocumentCloud” and enter your DocumentCloud username and password when prompted. Your DocumentCloud projects will appear. Select the project that you want to explore, and get a coffee while Overview imports and analyzes it.

3. Explore the documents in the tree view

Overview’s main screen is divided into four parts: the topic tree, the tag list, the document list, and the document viewer.

The topic tree view displays your documents sorted into the topics and sub-topics that Overview has automatically created for your documents. The big node at the top contains all documents. It splits into several smaller nodes below, each of which contains  documents on similar topics. The nodes are different sizes, because sometimes Overview finds many documents on a similar topic, while in other cases a document is so unique that Overview puts it into a node all by itself.

You can pan the tree left and right by dragging with the mouse, or moving the scroll bar. You can zoom into the tree by using the mouse wheel, two fingers on the trackpad, or dragging one end of the scroll bar. Nodes which have a small ⊕ in the center can be expanded to show children, while ⊖ hides children.

Each node is labelled by the top keywords from the documents in that node. These words tell you the topic of the node. The children of a node contain, collectively, all of the documents in the parent, but broken down into more specialized topics.

When you select a node, the documents in it appear in the document list. Each document in the list is represented by a list of keywords specific to that document. Clicking on a document on the list loads it in the document viewer.

4. Tag interesting documents
As you explore the topic tree, you’ll run across individual documents or entire nodes you want to remember. Enter a descriptive tag in the “new tag” field and press “tag.” The currently selected documents will be tagged, and a little tag color swatch will appear next to them in the document list.

Once you’ve created a tag, you can add the currently selected documents to it at any time by pressing the + button that appears when your mouse is over the tag name.  (To tag an entire node at once, select the node and then press the +/- button.) Or press – to remove the tag.

Clicking on a tag name selects that tag,  highlighting the tagged documents in the tree and loading them into the document list.

5. Work your way through the tree
When you have a lot of documents, it pays to be systematic. We recommend working your way through the nodes in the tree from left to right — biggest topics to smallest topics. Select a node, then view a few of the documents in it to see if you understand what they have in common. If there seems to be more than one important topic in the documents in that node, try opening up the child nodes instead, until you find a node where all of the documents are more or less the same. Then, tag that node with a descriptive label.

As you proceed, you may find documents in the same topic in different nodes. Overview doesn’t know what story you are working on, so it can’t always guess how the documents should be arranged. You can apply a tag to any combination of nodes and documents to create a set that is meaningful to you.

You may also  discover that the documents in a node are irrelevant to your story, in which case you can tag them with “read” and simply move on. Part of the power of Overview is being able to decide not to look at an entire topic.

When you’re finished this process, you’ll have a neatly categorized tree, and a set of tags corresponding to all the interesting topics in your documents.

6. Ask for help!
Questions? Bugs? Something you’d like to see in a future version! Contact us!

How I Used Overview to Report on 8,000 Police Department Emails

Guest post by Jarrel Wade, Tulsa World. Originally at PBS IdeaLab

In May, I published a story which described how the Tulsa Police Department in Oklahoma purchased millions of dollars of under-powered and under-tested computer hardware, resulting in a multitude of problems.

Emails showed complaints from the field in which officers were unable to get basic police information about dangerous calls when they were en route to scenes, or network dead spots around town that officers were completely avoiding.

But leading into April, I had no idea how I was going to read all these emails by myself.

Three weeks away from receiving the documents, I called my city records official for an update and was told my request had expanded several times over. I would be receiving about 8,000 emails from the city of Tulsa based on a server keyword search regarding a technology purchase for the city’s police department. By far the largest open records request I’ve ever made, it took a four-month city legal review and would end up being its own line-item on the police department’s budget, the chief later told me.

Searching the Internet and IRE website for help on reviewing thousands of emails, I came across DocumentCloud and Overview. The Overview developers had just made a pre-beta version available for testing. I had been prepared to spend months of my spare time reading email after email, opening PDF after PDF as long as I could hold out on my editors without writing a story. Overview was the perfect find for a tech-savvy reporter — installing a staircase to the top of the mountain.

The Next Step
After some difficulties of cleaning the documents (the emails came in Outlook format, which became a pain to convert to clean PDFs), the next step was figuring out how to make Overview work for my documents. My first impression after loading the emails was, “OK, now what?”

I found that Overview works differently for every document set. For emails, I think Overview works best with a completely random selection — say all emails for a department in a given month. Lots of emails would be meaningless, spam or pictures of cats, so Overview can be used to easily dismiss the majority. Given a set of emails based on a keyword search, the problem is more difficult because most of the emails will be at least somewhat relevant.

In this case, Overview was most useful as an organizational tool. I could look at an email, make a note, and easily have it grouped with other similar emails through tagging.

I started with a branch of Overview’s document tree and starting clicking, glancing, noting and tagging. Right off the bat, I found that Overview had grouped together all of the similarly formatted “service desk” requests. There were hundreds if not thousands of those, so I was able to tag them by the dozens without a second thought — while focusing on the more meaty emails.

The next thing Overview did for me was to generally group email chains together. Much of my document set was taken up with emails that were replies or mass messages that were duplicates of other emails. Those were easy because I could find the most complete version, annotate it, and write off the rest.

Several hundred tags into my documents, I realized that simply tagging an email into a single sub-group was not good enough. Overview allows each document to have several different tags, I found out. Until then, I had been tagging emails by the sender’s department — city legal, city IT, police IT, purchasing, police administration, etc. From that point forward, I also tagged each email according to whether or not it was sent from one of my main players, and a separate tag for whether it was “important.” That allowed me to look later at all my quotable or crucial emails.

A good strategy for tagging your documents is important. I recommend having a tag for important, crucial, quotable or relevant documents — whatever you wish to call it.

The End Game
Once all the emails were tagged, the important ones were annotated, and I felt like I had a good “overview” of the document set, it was time for the end game. This is where Overview developers are hard at work — Overview already facilitates digesting and reviewing thousands of emails, but how does it handle the one email that’s different from the rest, because it’s the only one discussing officers accepting payments for travel from the vendor of an item you spent millions of public dollars on?


This email only has two people in it from any of the other emails and almost none of my keywords, but it became crucial to my story.

Overview is not far away from being the one-stop, mass-document-review source, but it’s not intended to do all the work for the reporter, I found (and its developers will agree, I’m sure.) I still had to go through all of the small branches in the document tree, looking for the unique, unmatched emails that Overview couldn’t pair with other documents, in case I had overlooked something.

Despite the final effort, Overview was still crucial to my end game as I was able to review and find documents far easier than if I was to search for a given email through a keyword search. All I had to do to review my work in Overview was select my “Important” tag and scan through the few hundred emails that I deemed important.

Another interesting part here was that I would remember emails that I had deemed irrelevant at first read, but now seemed relevant because of another supplemental email. I could easily find the original email by pulling up the tag and looking through that bank of emails in minutes. Keyword searches for one email out of 8,000 just doesn’t compare to the organization Overview provides.

In the end, I’m guessing it would have taken four reporters — splitting up emails into stacks of a few thousand — to do the work I did in two weeks. Furthermore, they’d have to do it full-time and compare notes at the end. Overview, with the help of DocumentCloud, allowed me to have all my documents annotated in one place. Additionally, it invaluably allowed me to save my work, move to another story on my beat, and then start up again without losing momentum.

Finally, the work the Overview developers did to add to his program was impressive and very helpful. Every bit of feedback I gave led to immediate changes, which tells me the Overview team needs more feedback from a wider audience. It’s a wide-open program with tons of potential, in addition to many basic features that are practical now to any level of reporter with the gall to request thousands of documents.