Guest post by Jarrel Wade, Tulsa World. Originally at PBS IdeaLab
In May, I published a story which described how the Tulsa Police Department in Oklahoma purchased millions of dollars of under-powered and under-tested computer hardware, resulting in a multitude of problems.
Emails showed complaints from the field in which officers were unable to get basic police information about dangerous calls when they were en route to scenes, or network dead spots around town that officers were completely avoiding.
But leading into April, I had no idea how I was going to read all these emails by myself.
Three weeks away from receiving the documents, I called my city records official for an update and was told my request had expanded several times over. I would be receiving about 8,000 emails from the city of Tulsa based on a server keyword search regarding a technology purchase for the city’s police department. By far the largest open records request I’ve ever made, it took a four-month city legal review and would end up being its own line-item on the police department’s budget, the chief later told me.
Searching the Internet and IRE website for help on reviewing thousands of emails, I came across DocumentCloud and Overview. The Overview developers had just made a pre-beta version available for testing. I had been prepared to spend months of my spare time reading email after email, opening PDF after PDF as long as I could hold out on my editors without writing a story. Overview was the perfect find for a tech-savvy reporter — installing a staircase to the top of the mountain.
The Next Step
After some difficulties of cleaning the documents (the emails came in Outlook format, which became a pain to convert to clean PDFs), the next step was figuring out how to make Overview work for my documents. My first impression after loading the emails was, “OK, now what?”
I found that Overview works differently for every document set. For emails, I think Overview works best with a completely random selection — say all emails for a department in a given month. Lots of emails would be meaningless, spam or pictures of cats, so Overview can be used to easily dismiss the majority. Given a set of emails based on a keyword search, the problem is more difficult because most of the emails will be at least somewhat relevant.
In this case, Overview was most useful as an organizational tool. I could look at an email, make a note, and easily have it grouped with other similar emails through tagging.
I started with a branch of Overview’s document tree and starting clicking, glancing, noting and tagging. Right off the bat, I found that Overview had grouped together all of the similarly formatted “service desk” requests. There were hundreds if not thousands of those, so I was able to tag them by the dozens without a second thought — while focusing on the more meaty emails.
The next thing Overview did for me was to generally group email chains together. Much of my document set was taken up with emails that were replies or mass messages that were duplicates of other emails. Those were easy because I could find the most complete version, annotate it, and write off the rest.
Several hundred tags into my documents, I realized that simply tagging an email into a single sub-group was not good enough. Overview allows each document to have several different tags, I found out. Until then, I had been tagging emails by the sender’s department — city legal, city IT, police IT, purchasing, police administration, etc. From that point forward, I also tagged each email according to whether or not it was sent from one of my main players, and a separate tag for whether it was “important.” That allowed me to look later at all my quotable or crucial emails.
A good strategy for tagging your documents is important. I recommend having a tag for important, crucial, quotable or relevant documents — whatever you wish to call it.
The End Game
Once all the emails were tagged, the important ones were annotated, and I felt like I had a good “overview” of the document set, it was time for the end game. This is where Overview developers are hard at work — Overview already facilitates digesting and reviewing thousands of emails, but how does it handle the one email that’s different from the rest, because it’s the only one discussing officers accepting payments for travel from the vendor of an item you spent millions of public dollars on?
This email only has two people in it from any of the other emails and almost none of my keywords, but it became crucial to my story.
Overview is not far away from being the one-stop, mass-document-review source, but it’s not intended to do all the work for the reporter, I found (and its developers will agree, I’m sure.) I still had to go through all of the small branches in the document tree, looking for the unique, unmatched emails that Overview couldn’t pair with other documents, in case I had overlooked something.
Despite the final effort, Overview was still crucial to my end game as I was able to review and find documents far easier than if I was to search for a given email through a keyword search. All I had to do to review my work in Overview was select my “Important” tag and scan through the few hundred emails that I deemed important.
Another interesting part here was that I would remember emails that I had deemed irrelevant at first read, but now seemed relevant because of another supplemental email. I could easily find the original email by pulling up the tag and looking through that bank of emails in minutes. Keyword searches for one email out of 8,000 just doesn’t compare to the organization Overview provides.
In the end, I’m guessing it would have taken four reporters — splitting up emails into stacks of a few thousand — to do the work I did in two weeks. Furthermore, they’d have to do it full-time and compare notes at the end. Overview, with the help of DocumentCloud, allowed me to have all my documents annotated in one place. Additionally, it invaluably allowed me to save my work, move to another story on my beat, and then start up again without losing momentum.
Finally, the work the Overview developers did to add to his program was impressive and very helpful. Every bit of feedback I gave led to immediate changes, which tells me the Overview team needs more feedback from a wider audience. It’s a wide-open program with tons of potential, in addition to many basic features that are practical now to any level of reporter with the gall to request thousands of documents.