Getting Started with the Overview Prototype

Note: these instructions apply to the original prototype version. We highly recommend the new web version at this time — no installation required!

You can be up and running with the Overview prototype, browsing through the sample document sets, in just a few minutes.

Getting ready
First you will need Git to download the program and sample files. If you’re not used to Git, this might be a bit of a pain now, as opposed to just a straightforward download. But because Overview is a prototype under active development, we’re constantly fixing bugs and adding features, and Git lets you download and install new versions in one easy step. Git can be downloaded here for Windows and here for Mac.

You will also need Ruby to run Overview. For windows, you can download an installer here. Be sure to download the latest (Ruby version 1.9.3) because Ruby 1.8.7 won’t work. On the Mac, you’re in luck because Ruby comes pre-installed.

Overview also needs Java. Windows comes with Java preinstalled these days. Some versions of OS X come with Java too, and the ones that don’t will asked you want to install Java the first time you try to run Overview. Say yes. However, there have been some reports that Overview won’t run with the Java version that comes with OS X 10.5 and above… so 10.6 or above is recommended.

If you’re on Linux, you’re probably already comfortable with standard development tools, so I’m just say that you need to get Git, Ruby 1.9.3, and Java going. You will also need to replace visualization/lib/swt.jar with the appropriate version for your operating system.

Installing the prototype
Now that you have Git installed, you can download the Overview program files by entering the following command in Terminal (Mac) or the command prompt (Windows)  like so:

 git clone https://github.com/overview/overview-prototype.git

Congratulations! You now have Overview on your computer. You’ll probably want some sample data files to get started.

Loading the sample files
Before you load your own documents, you probably want to get one of the sample files loaded. Get them form Github:

 git clone https://github.com/overview/overview-sample-files.git

You now have three sample dataset: 1,500 press releases from nj-senator-mendendez, 4,500 OCR’d pages of iraq-contractor-incidents, and the caracas-cables, about 7,000 Wikileaks cables which originate from or mention the city of Caracas. Each of these datasets is a single CSV input file containing all the text of all of the documents, plus another file containing some tags we created earlier (which you can view with the “load” button in the interface.)

Viewing a dataset is currently two step process. First, you have to do some natural language preprocessing. This takes a few minutes, but only has to be done once. Then you can start the GUI. Starting from the directory where you ran the git clone commands above, you can load up Senator Mendendez’s press releases like so:

Windows:

  cd overview-sample-files
  ..overview-prototypepreprocess.bat nj-senator-menendez
  ..overview-prototypeoverview.bat nj-senator-menendez

Mac/Linux:

  cd overview-sample-files
  ../overview-prototype/preprocess.sh nj-senator-menendez
  ../overview-prototype/overview.sh nj-senator-menendez

Or substitute iraq-contractor-incidents or caracas-cables if you’d like to view those document sets. Again, you only need to run the preprocess script once; you can start right up with the overview script every time thereafter.

Using your own documents
There are three ways you can use your own documents: you can visualize PDF or TXT files on your local machine, upload the documents to DocumentCloud, or import the documents from a CSV file.

Suppose you have a directory called documents-dir full of PDF and/or TXT files. You can view load it into overview like this:

Windows:

  overview-prototypeloadpdf.bat documents-dir mydocs
  overview-prototypeoverview.bat mydocs

Mac/Linux:

  overview-prototype/loadpdf.sh documents-dir mydocs
  overview-prototype/overview.sh mydocs

Overview will scan documents-dir and all sub directories for PDF and TXT documents and generate all the files that it needs to run the visualization. This only needs to be done once. Thereafter, you can start the visualization with the overview command whenever you like.

If you would like to upload that directory to DocumentCloud first, so that you can annotate, share, and eventually publish some or all of the documents, do this:

Windows:

  overview-prototypedcupload.bat documents-dir mylogin@myorg.org mypassword mydocs
  overview-prototypepreprocess.bat mydocs
  overview-prototypeoverview.bat mydocs

Mac/Linux:

  overview-prototypedcupload.sh documents-dir mylogin@myorg.org mypassword mydocs
  overview-prototypepreprocess.sh mydocs
  overview-prototypeoverview.sh mydocs

You have to replace “mylogin” and “mypassword” with your documentcloud username and password, of course. Overview will scan all subfolders of documents-dir for PDF and TXT files and upload them to DocumentCloud, simultaneously creating an input CSV file for the preprocess script that links to the newly uploaded documents.  Again, you only need to run dcupload and preprocess once.

If you have documents or text in some other format and you’re handy with CSV files, you can also create your own input files for Overview directly. The format is quite simple, documented here.

More!
You should also check out video introduction to Overview. We are putting together more training materials, as well as building a few tools to help get your documents into the right format, such as a script that will read all the text or html files in a directory and glue them together in the csv that Overview likes. We’re also always fixing bugs. To get the very latest version (for instance, after we tweet about a bug fix), just open up a command prompt in the overview-prototype directory and type

 git pull

Finally, this is just a prototype. Our next step will be to integrate Overview with DocumentCloud, and make it run right in the browser.

Next steps for development, and a job posting

With our recent analysis of Iraq security contractor documents, the Overview prototype has been used for its first real story. But our prototype is just that: a proof-of-concept tool, built as quickly as possible to validate certain algorithms and approaches. The next step is to create a solid architecture for future work. We need to make this technology web-deployable, scalable, and integrated with DocumentCloud.

If you haven’t already, take a look at our writeup of how we used the Overview prototype for our Iraq security contractors work. We started with documents posted to DocumentCloud, then downloaded the original PDF files for processing with a series of Ruby scripts. After processing, we used the prototype visualization interface, written in Java, to find topics and tag documents in bulk according to their subject. We’d like to streamline this whole process, so that Overview works like this:

  • Upload raw material to DocumentCloud.
  • Select documents for exploration in Overview, by using the DocumentCloud project and search functions.
  • Launch Overview, directly in the browser. Uses the visualization tools to explore the set, create subject tags, and apply them to the documents.
  • Export Overview’s tags back into native DocumentCloud tags and annotations.

In short, we want to tightly integrate Overview’s semantic visualization with DocumentCloud’s storage, search, viewing, annotation, and management tools. This means that Overview has to have a web front end, which means the interface needs to be Javascript, not Java. We also suspect that for performance reasons, the visualizations will need to be rendered in WebGL. On the back end, Ruby is just too slow for natural language processing  For example our common bigram detection code (which helps Overview discover frequently used two-word phrases) takes several minutes to build an intermediate table with hundreds of thousands of elements for the 4,500 pages of the Iraq contractor set. We’d like the Overview architecture to scale to millions of pages — a thousand times larger, which would take days with the current algorithm. So the server-side processing needs to be implemented in a higher performance language, such as Java.

The good news is that Overview uses one of the same basic data structures as search engines, a TF-IDF weighted index. DocumentCloud uses the popular Solr search platform, so integration with DocumentCloud will also pave the way for integration with any application which is based on Solr. That’s a lot of possible applications.

Given all of the above, this is the current planned order of  development tasks, which we think could be accomplished in about a year by a competent engineer: Rewrite the prototype with a Java backend and JavaScript/WebGL UI. Integrate the user experience with DocumentCloud’s tagging system. Then integrate with back end the Solr index data structures and APIs. As we go, we’ll collect feedback from our growing tester and user community and decide what to build next — there is a wide range of problems we could address.

We’re hiring two engineers on a full-time basis to accomplish this, perhaps one person who’s more inclined to the user interface, and one who is more into the back end processing. We’re looking for

  • Solid Java or JavaScript engineering experience, preferably 3-5 years of work on large applications.
  • Familiarity with open source development projects.
  • Experience in computer graphics, visualization, natural language processing, or distributed systems a plus.

This is a contract position. We’d prefer if you worked with us out of the AP offices in New York, but we’ll consider remote contributors. Please contact jstray@ap.org if interested.

Using Overview to analyze 4500 pages of documents on security contractors in Iraq

This post describes how we used a prototype of the Overview software to explore 4,500 pages of incident reports concerning the actions of private security contractors working for the U.S. State Department during the Iraq war. This was the core of the reporting work for our previous post, where we reported the results of that analysis.

The promise of a document set like this is that it will give us some idea of the broader picture, beyond the handful of really egregious incidents that have made headlines. To do this, in some way we have to take into account most or all of the documents, not just the small number that might match a particular keyword search.  But at one page per minute, eight hours per day, it would take about 10 days for one person to read all of these documents — to say nothing of taking notes or doing any sort of followup. This is exactly the sort of problem that Overview would like to solve.

The reporting was a multi-stage process:

  • Splitting the massive PDFs into individual documents and extracting the text
  • Exploration and subject tagging with the Overview prototype
  • Random sampling to estimate the frequency of certain types of events
  • Followup and comparison with other sources

Splitting the PDFs
We began with documents posted to DocumentCloud — 4,500 pages worth of declassified, redacted incident reports and supporting investigation records from the Bureau of Diplomatic Security. The raw material is in six huge PDF files, each covering a six-month range, and nearly a thousand pages long.

Overview visualizes the content of a set of  “documents,” but there are hundreds of separate incident reports, emails, investigation summaries, and so on inside each of these large files. This problem of splitting an endless stack of paper into sensible pieces for analysis is a very common challenge in document set work, and there aren’t yet good tools. We tackled the problem using a set of custom scripts, but we believe many of the techniques will generalize to other cases.

The first step is extracting the text from each page. DocumentCloud already does text recognition (OCR) on every document uploaded, and the PDF files it gives you to download have the text embedded in them. We used DocumentCloud’s convenient docsplit utility to pull out the text of each page into a separate file, like so:

docsplit text –pages all -o textpages january-june-2005.pdf

This produces a series of files named january-june-2005_1.txt, january-june-2005_2.txt etc. inside the textpages directory. This recovered text is a mess, because these documents are just about the worse possible case for OCR: many of these documents are forms with a complex layout, and the pages have been photocopied multiple times, redacted, scribbled on, stamped and smudged. But large blocks of text come through pretty well, and this command extracts what text there is into one file per page.

The next step is combining pages into their original multi-page documents. We don’t yet have a general solution, but we were able to get good results with a small script that detects cover pages, and splits off a new document whenever it finds one. For example, many of the reports begin with a summary page that looks like this:


Our script detects this cover page by looking for “SENSITIVE BUT UNCLASSIFIED,” “BUREAU OF DIPLOMATIC SECURITY” and “Spot Report” on three different lines. Unfortunately, OCR errors mean that we can’t just use the normal string search operations, as we tend to get strings like “SENSITIZV BUT UNCLASSIEIED” and “BUR JUDF DIPLOJ>>TIC XECDRITY.” Also, these are reports typed by humans and don’t have a completely uniform format. The “Spot Report” line in particular occasionally says something completely different. So, we search for each string with a fuzzy matching algorithm, and require only two out of these three strings to match.

We found about 10 types of cover pages in the document set, each of which required a different set of strings and matching thresholds. But with this technique, we were able to automatically divide the pages into 666 distinct documents, most of which contain material concerning a single incident. It’s not perfect — sometimes cover pages are not detected correctly, or are entirely missing — but it’s good enough for our purposes.

The pre-processing script writes the concatenated text for each extracted document into one big CSV file, one document per row. It also writes out the number of pages for that document, and a document URL formed by adding the page number to the end of a DocumentCloud URL. If you can get your document set into this sort of CSV input format, you can explore it with the Overview prototype.

Exploring the documents with Overview

The Overview prototype comes in two parts: a set of Ruby scripts that do the natural language processing, and a document set exploration GUI that runs as a desktop Java app. Starting from iraq-contractor-incidents.csv, we run the preprocessing and launch the app like this,

./preprocess.sh iraq-contractor-incidents

./overview.sh iraq-contractor-incidents

Overview has advanced quite a bit since the proof-of-concept visualization work last year, and we now have a prototype tool set with a document set exploration GUI that looks like this (click for larger)

Top right is the “items plot,” which is an expanded version of the prototype “topic maps” that we demonstrated in our earlier work visualizing the War Logs. Each document is a dot, and similar documents cluster together. The positions of the dots are abstract and don’t correspond to geography or time. Rather, the computer tries to put documents on about similar topics close together, producing clusters. It determines “topic” by analyzing which words appear in the text, and how often.

Top left is the “topic tree”, our new visualization of the same documents. It’s based on the same similarity metric as the Items Plot, but here the documents are divided into clusters and sub-clusters.

The computer can see that categories of documents exist, but it doesn’t know what to call them. Nor do the algorithmically-generated categories necessarily correspond to the way a journalist might want to organize them. You could plausibly group incidents by date, location, type of event, actors, number of casualties, equipment involved, or many other ways.

For that reason, Overview provides a tagging interface (center) so that the user can name topics and group them in whatever way makes sense. The computer-generated categories serve as a starting point for analysis, a scaffold for the journalist’s exploration and story-specific categorization. In this image, the orange “aircraft” tag is selected, and the selected documents appear in the topic tree, the items plot, and as a list of individual documents. The first of these aircraft-related documents is visible in the document window, served up by DocumentCloud.

Random sampling
It took about 12 hours to explore the topic tree, assign tags and create a categorization that we felt suited the story. The general content of the document set was clear pretty quickly. At some point, there’s no way around a reporter reading a lot of documents, and Overview is really just a structured way to choose which documents to read. It’s a shortcut, because after you look at a few documents in a cluster and discover that they’re all more or less the same type of incident, you usually don’t really need to read the rest.

This process produces an intuitive sense of the contents of a document set. It’s key to finding the story, but it doesn’t provide any basis for making claims about how often certain types of events occurred, or whether incidents of one type really differed from incidents of another type. For example, we found that the incidents mostly involved contractors shooting at cars that got too close to diplomatic motorcades. But what does “mostly” mean? Is it a majority of the incidents? Do we need to look more closely at the other material, or does this cover 90 percent of what happened?

In principle, to answer this type of general question you’d need to read every single document, keeping a count of how many involved “agressive vehicles,” as they are called in the reports. Dividing that count by the total number of documents gives the percentage. Reading every document is impractical, but there’s an excellent shortcut: random sampling.

Random sampling is like polling: ask a few people, and substitute their results for the whole population. The randomization ensures that you don’t end up polling a misrepresentative group. For example, if all of the sample documents we choose to look at come from a pile which contains much more “agressive vehicle” incidents than average, obviously our percentages will be skewed. For this reason, Overview includes a button that chooses a random document from among those currently selected. If you first select all documents, this is a random sample drawn from the entire set.

We used a random sample of 50 out of the 666 documents to establish the factual basis of the following statements in our report:

  • The majority of incidents, about 65 percent, involve a contractor team assigned to protect a U.S. motorcade firing into an “aggressive” or “threatening” vehicle.
  • there is no record of followup investigations in an estimated 95 percent of the reports.
  • About 45 percent of the reports describe events happening outside of Baghdad.
  • Our analysis found that only about 2 percent of the 2007 motorcades in Iraq resulted in a shooting.

Each of these is a statement about a proportion of something, and the sampling gives us numerical estimates for each. Along with their associated sampling errors, these figures are strong evidence that the statements above are factually correct. (The relevant sampling error formula is for “proportion estimation from a random sample without replacement,” and gives a standard error of about ±5% for our sample size.)

We also used sampling to estimate the number of incidents of contractor-caused injury to Iraqis that we might not have found. During the reporting process we found 14 such incidents (1,2,3,4,5,6,7,8,9,10,11,12,13,14) but keyword search is not reliable for a variety of reasons. For example it is based on the scanned text, which is very error-prone. Could we be missing another few dozen such incidents? We can say with high probability that the answer is no, because we independently estimated the number of such incidents using our sample, and found it to be 2% ±2% out of 666, or most likely somewhere between 0 and 26 documents, with an expected value of 13. So while we are almost certainly missing a few incidents, it’s very unlikely that we’re missing more than a handful.

Other sources
Documents never tell the whole story; they’re simply one source, ideally one source of many. For this story, we first consulted with AP reporter Lara Jakes, who has been covering events from Baghdad for many years, and has written about private security contractors in particular. She provided a crucial reality check to make sure we understood the complex environment that the documents referred to. We also looked at many other document sources, including the multitude of lengthy government reports that this issue has generated over the years.

We then set up a call with the Department of State. Undersecretary for Management Patrick Kennedy spent almost an hour on the phone with us, and his staff worked hard to answer our followup questions. In addition to useful background information, they provided us with the number of cases concerning security contractor misconduct that the State Department has referred to the Department of Justice: five. They also told us that there were 5,648 protected diplomatic motorcades in Iraq in 2007. These figures add crucial context to the incident counts we were able to pull out of the document set, and we do not believe that either has been been previously reported.

Finally, we searched news archives and other sources, such as the Iraq Body Count database, to see if the incidents of Iraqi injury we found had been previously reported. Of  the fourteen incidents, four appear to have been documented elsewhere. We believe this document and this news report refer the same incident, as well as this and this, and we suspect also this is the same as record d0233 in the Iraq Body Count database, while this matches record d4900. Of course, there may be other records of these events, but after this search we suspect that many of the incidents we found were previously unreported.

Next steps
This is the first major story completed using Overview, which is still in prototype form. We learned a lot doing it, and the practical requirements of reporting this story drove the development of the software in really useful ways. The code is up on GitHub, and over the next few weeks we will be releasing training materials which we hope will allow other people to use it successfully. We will also hold a training session at the NICAR conference this week. The software itself is also being continually improved. We have a lot of work to do.

Our next step is actually a complete rewrite, to give the system a web front end and integrate it with DocumentCloud. This will make it accessible to many more people, since many journalists already use DocumentCloud and a web UI means there is nothing to download and install. We’re hiring engineers to help us do this; for details on the plan, please see our job posting.

What did private security contractors do in Iraq?

The U.S. employed more private contractors in Iraq than in any previous war, at times exceeding the number of regular military personnel, and roughly 10% of them were in armed roles by the end of the war. A few high-profile incidents made headlines, such as the Blackwater shootings at Nisoor Square in September 2007, but there hasn’t yet been a comprehensive public record of these private security contractors’ actions at the height of the war. Thousands of pages of recently released material changes that — and provides an ideal test case for Overview’s evolving document mining capabilities.

The documents show that mostly, these contractors fired at approaching civilian vehicles to protect U.S. motorcades from the threat of suicide bombers. The documents also show how often shots were fired, and provide a window into how State Department oversight of security contractors tightened during the war.

The documents come from a Freedom of Information request filed with the U.S. Department of State by journalist John Cook in November 2008. Cook received the paperwork in batches over the last 18 months, and posted the 4,500 pages of incident reports and supporting investigation records from the Bureau of Diplomatic Security on DocumentCloud.

The record only covers the work of State Department contractors between 2005 and 2007; the majority of U.S. contractors worked for the Department of Defense, according to a 2008 Government Accountability Office report. The State Department also has excluded some documents relating to ongoing criminal investigations or national security. Nonetheless, this is the most exhaustive record we have, and offers us the possibility of moving beyond anecdotes to broader patterns.

In addition to the document analysis, we spoke with Undersecretary for Management Patrick Kennedy, who oversees the State Department’s Bureau of Diplomatic Security.  That conversation provided context for these events. His assistant, Christina Maier, answered many of our specific questions.

For details on how we used the Overview prototype to report on these documents, including the exact methodology, see this post.

What did private security contractors do?

The documents cover about 600 incidents that involved security contractors firing a weapon in Iraq. It’s not clear exactly how the department decided whether a report was warranted.  Some reports are many pages long, including witness testimony and extended investigative reports.  In other cases, only a terse cover page exists.  The documents mostly concern the actions of the three private contractors then working for the State Department: Blackwater, DynCorp, and Triple Canopy. A handful of incidents involve KBR, another contractor; and the U.S. Marines.

The majority of incidents, about 65 percent, involve a contractor team assigned to protect a U.S. motorcade firing into an “aggressive” or “threatening” vehicle.

A typical example, involving a detail protecting involving workers for the U.S. Agency for International Development in Baghdad, reads:

At approximately 0950, 11 May 05, a USAID PSD [private security detail] Team fired four rounds into the hood of a dark colored BMW taxi after the driver of the vehicle moved around a line of traffic, failed to yield to verbal and hand signals and approached the PSD vehicles while the detail was slowing for congested traffic. Upon receiving fire, the BMW slowed its approach and rolled to a stop against a bus parked on the right side of the road. The PSD exited the area and continued with their mission without further incident. There were three USAID principals onboard at the time of the incident. No friendly personnel were affected. Status of driver and hostile vehicle is unknown at this time.

The bulk of the documents report hundreds of such incidents with minor variations. The report always includes at least a brief mention of the ways that the contractors tried to stop the vehicle before shooting. Sometimes, “verbal commands” or “visual signals” are mentioned. In other cases the contractors tried flashing lights, threw water bottles, or fired flares or smoke grenades before firing.

Motorcade guards shot vehicles that approached too closely because of the threat of vehicle suicide bombers, known as “vehicle-borne improvised explosive devices” or “VBIEDs.” It’s not clear how many of the vehicles were actually a threat; there is no record of followup investigations in an estimated 95 percent of the reports. There are few details about what happens to the driver of the vehicles that were shot at; sometimes, a report  states that the driver “did not appear to be injured.” In other instances, there is no comment at all.

Most reports describe a few rounds fired into the front of the vehicle, that succeed in stopping the car. On other occasions, gunners fired into the car windows if shots to the front grille didn’t stop the car.  We found a number of incidents where, after nonviolent warnings, contractors fired into windows first (1,2,3,4,5). On two of these occasions, gunners said that they “didn’t have time to shoot to disable,” which was acceptable under the policies then in force.

Some of the drivers didn’t stop the cars, and just kept  kept going after taking bullets. One taxi took four rounds and “continued to push past the motorcade.”

We found 10 recorded Iraqi deaths, and a smaller number of injuries. In one case a bullet went through the windshield and hit the driver’s right shoulder. The team “provided first aid and turned the man over to a local national who stated that he was a doctor.” In another case, an ambulance was called and the team waited, but the driver eventually refused help and left the scene. But in general these contractors do not seem to have been equipped to deliver medical aid. After one fatal shooting, the investigator who interviewed the team noted, “Vehicle was engaged due to possible VBIED; there is no standard operating procedure for PSD teams to search vehicles render aide to [sic] in such an incident.”

The documents show that shots were also fired as the result of  misunderstandings. After Marines fired on a car trying to enter the U.S. Embassy Annex through the exit lane, investigators concluded that “the local national had no apparent hostile intention and his actions were based on his misunderstanding of the new security procedures.” On another occasion a Marine fired at a vehicle driven by “a U.S. citizen employed by the U.S. Army Corps of Engineers” who “was talking on his cellular telephone and didn’t follow the Marines’ directions.” In another incident, a DynCorp. team shot at an Iraqi judge after he failed to stop his car, hitting him in the leg.

The bulk of the documents concern this type of “escalation of force” against a vehicle, but a smaller number of documents report contractor responses to attacks on U.S. personnel.  A motorcade was fired upon by fighters on the roof of an abandoned five-story building.  An attack on the “Municipalities and Public Works Annex building” ultimately killed five U.S. personnel  in a helicopter crash, and was later cited by the State Department as an example of heroic behavior by a contractor. There was an attack on Baghdad’s city hall, and another at a Doura power plant. There are also several instances of Blackwater aircraft brought down by small arms fire (1,2,3).

About 45 percent of the reports describe events happening outside of Baghdad. In the provincial capital of Basra, the palace compound was repeatedly attacked by rockets. In what was described as a “suicide probe,” a man carrying a “white bag” approached the gates of the U.S. Embassy in Basra and would not stop after warnings and a flash grenade. Guards shot him. There are also a half-dozen reports of suspicious boats approaching the Embassy building from the riverside, and in one case a Triple Canopy contractor fired upon a boat after it ignored flares and warning shots.

Finally, there are a handful of reports of contractors shooting aggressive stray dogs. In one instance a Blackwater contractor killed a dog  that belonged to the New York Times’  Baghdad bureau, after it fought with the contractor’s bomb-sniffing dog.

Tightening oversight

The documents show that the shootings led to greater oversight as the war progressed. In February 2005, Blackwater guards fired over 100 rounds at a car approaching their motorcade on the other side of a median, hitting the driver. The contractors initially maintained that the car’s passenger had fired into their vehicle, but investigators later found that the Blackwater guards had fired them. They also claimed that the car was on a pre-existing list of suspicious vehicles, known as the “be on the lookout” list.

Yet one of the guards later told investigators that claiming that the vehicle was on this list was “simply standard practice when reporting a shooting incident, per Blackwater management.”

The investigator’s report says that “several of the PSD individuals involved in the shooting provided false statements to the investigators,” but the head of diplomatic security in Baghdad, John Frese, decided not to discipline the contractors because it “would be deemed as lowering the morale of the entire PSD entity.”

The State Department declined a request to comment on this incident.

The investigator’s February 2005 report recommended several policy changes, including posting signs on motorcade vehicles stating “stay back 100 meters” in English and Arabic, counting the number of rounds fired after every shooting incident, and “establishing a clear and unambiguous policy regarding appropriate use of warning/disabling shots at vehicles.”

The documents include a State Department security contractor policy manual dated August 2005 with such guidelines. The manual said that shooting at approaching vehicles is authorized “if it constitutes the appropriate level of force to mitigate the threat.” Shots can be fired into a car “to prohibit a threat from entering into an area where the protective detail would be exposed to an attack,” the manual says. It also advises contractors to issue visible and verbal warnings before firing.

This policy also requires an internal investigation and written reports from all shooters and witnesses any time a firearm is discharged.

Were problems common?

Out of about 600 incidents in total, the AP found 14 incidents where an Iraqi was injured by contractor gunfire, including 8 deaths. (1,2,3,4,5,6,7,8,9,10,11,12,13,14).

The State Department told us that there were 5,648 protected diplomatic motorcades in Iraq in 2007. Our analysis found that only about 2 percent of the 2007 motorcades in Iraq resulted in a shooting. This agrees closely with previous estimates that between 1 percent and 3 percent of the motorcades involved shootings, according to congressional testimony.

Out of all the cases where contractors used force, the State Department told us that a total of five cases have been referred to the Department of Justice for possible prosecution.

Prosecution doomed from the start

On September 17, 2007, guards working for Blackwater Worldwide shot and killed 17 Iraqi civilians in Nisoor Square, Baghdad. The incident received international media attention and spawned a congressional hearing. But the criminal case against five former Blackwater contractors was dropped after a judge ruled that government prosecutors improperly relied on statements that the State Department compelled the contractors to make.

The documents analyzed by the AP provide an important clue as to how this might have happened. There is a frequently used “sworn statement” form for contractors (like this example) which states “I further understand that neither my statements nor any information or evidence gained by reason of my statements can be used against me in a criminal proceeding.”  Such statements, mandatory whenever shots were fired, suggest that contractors were effectively granted automatic immunity immediately after any incident.

Even if that were not the case, it’s not clear what laws would cover alleged crimes.  Security contractors in Iraq were immune from Iraqi law until the end of 2008, while current U.S. laws may not cover the acts of overseas armed contractors not directly involved in a Department of Defense mission.

Blackwater, now known as Academi, settled a civil suit with the families of several of the Nisoor Square victims in January.

After Nisoor Square

The documents reviewed by the AP do not include the Nisoor Square shootings, which triggered major changes in contractor oversight. An expert panel convened by then-Secretary of State Condoleezza Rice recommended 18 specific policy changes. According to a subsequent GAO report, the State Department implemented most of the changes,  including placing at least one government security agent in each motorcade, installing video cameras in all vehicles, and recording both radio transmissions and satellite-tracked vehicle locations.

According to the same report, the number of weapons discharges by security contractors working for both the Department and Defense and the Department of State decreased by 60 percent after the changes went into effect. Military and civilian casualties also fell greatly during the same time period, making it difficult to know if new policies resulted in fewer shootings.

Undersecretary Kennedy was on the expert panel that made those policy recommendations. He noted that no official in a department-escorted convoy has ever been killed in Iraq. (There have been deaths from other causes, such as mortar attacks.)

“We try not to be draconian about it,” he said. “Could we have done the same with less use of force? I don’t know how you could validate retrospectively that the escalation wasn’t appropriate.”

Either way, the State Department will continue to use security contractors in Iraq and worldwide. Kennedy said the number of security contractors working for the department in Iraq has increased since U.S.  troops left the country because the department now has additional security responsibilities, including the protection of six Iraqi military training sites.

“There are only about 1700 State Dept. special agents in the world,” he said. “We have 280 embassies. There is no way I can take 1700 special agents and about 100 officers and stretch them to do my mission without contractors.”