With our recent analysis of Iraq security contractor documents, the Overview prototype has been used for its first real story. But our prototype is just that: a proof-of-concept tool, built as quickly as possible to validate certain algorithms and approaches. The next step is to create a solid architecture for future work. We need to make this technology web-deployable, scalable, and integrated with DocumentCloud.
If you haven’t already, take a look at our writeup of how we used the Overview prototype for our Iraq security contractors work. We started with documents posted to DocumentCloud, then downloaded the original PDF files for processing with a series of Ruby scripts. After processing, we used the prototype visualization interface, written in Java, to find topics and tag documents in bulk according to their subject. We’d like to streamline this whole process, so that Overview works like this:
- Upload raw material to DocumentCloud.
- Select documents for exploration in Overview, by using the DocumentCloud project and search functions.
- Launch Overview, directly in the browser. Uses the visualization tools to explore the set, create subject tags, and apply them to the documents.
- Export Overview’s tags back into native DocumentCloud tags and annotations.
The good news is that Overview uses one of the same basic data structures as search engines, a TF-IDF weighted index. DocumentCloud uses the popular Solr search platform, so integration with DocumentCloud will also pave the way for integration with any application which is based on Solr. That’s a lot of possible applications.
We’re hiring two engineers on a full-time basis to accomplish this, perhaps one person who’s more inclined to the user interface, and one who is more into the back end processing. We’re looking for
- Familiarity with open source development projects.
- Experience in computer graphics, visualization, natural language processing, or distributed systems a plus.
This is a contract position. We’d prefer if you worked with us out of the AP offices in New York, but we’ll consider remote contributors. Please contact firstname.lastname@example.org if interested.