1. How do I get my documents into Overview?
Overview can read most file formats including PDF, Word, and HTML. It will automatically OCR scans and create searchable PDFs (which you can download later.) You can upload an entire folder at once (if you use the Chrome browser) and Overview will skip files that have already been uploaded.
You can also import a project from DocumentCloud, or upload text in bulk as a CSV file. More detailed instructions here.
2. What types of documents will Overview read?
Overview is designed for text: lots of text, the sort of narrative text that a human would read. Overview also works well on the text of social media posts.
Overview will load CSV files, but it is not primarily designed for tables, speadsheets, or other primarily numeric data or structured data — unless there is a field that has lots of text in ordinary human language. If you need to extract tables from PDF files, try Tabula.
Here’s the list of supported file formats.
3. How do I get my documents out of Overview?
Choose Export from the menu at upper right. You can export all documents or just those that match the current search. You can download a spreadsheet in CSV or Excel format, one document per row, or an archive of document files.
If Overview OCRd your documents, you will download a searchable PDF. If you split your documents into pages, you will download one file per page. If you originally uploaded your documents from a CSV file, you will get one text file per document. Otherwise you will just get the original document file back.
4. How many documents can Overview handle and how long will it take?
There is a current maximum of 2,000,000 documents per document set. We are steadily working to increase this limit. Overview can process about 1000 documents per minute, plus the time needed to upload the documents, OCR them (takes about an additional 10 seconds per page) or transfer them from DocumentCloud.
5. Who can see the documents I upload to Overview? Are they secure?
Only you can access the documents uploaded to your account, unless you share them. Overview can import your private projects from DocumentCloud.
We’re pretty serious about following industry standard security practices in our code and on our servers, but quite honestly we don’t have the resources to defend against a talented hacker or a subpoena. If either of these possibilities trouble you, feel free to run Overview on your own computer.
6. Which languages does Overview support?
Overview currently supports documents in English, Spanish, French, German, Russian, Arabic, Swedish, Dutch, and Romanian. We can add another language in a day or two if you have documents to test with. Note that Overview can read documents in different languages, but the UI is still in English (this file would need to be translated.)
7. Where’s the source code?
Overview is an open source project released under the AGPL license. It was initially developed at The Associated Press under a News Challenge grant from the Knight Foundation, with further support from Google Ideas and ACLU. The technology is available for OEM licensing and custom development. Contact us.
8. I’m interested in using Overview for my business.
Wonderful! Overview Services Inc. provides support, engineering of new features, and enterprise license. You can reach us at @overviewproject or by email, We’d love to hear from you — especially if you have a document set analysis problem that Overview can’t solve. I bet there’s something we can do for you.