1. Find names of places and companies
The Entities plugin will automatically find company names, place names (in multiple languages!), numbers, or just unusual words that aren’t in the dictionary. Like all plugins, it’s available under Add View.
Overview’s entity detection algorithms are designed to err on the side of including things that aren’t entities, rather than missing things which are — unlike normal NLP techniques which often miss 50% of entities. You can hit the little red X’s to remove junk from the list.
2. Make scanned PDFs searchable (OCR)
Overview will automatically OCR any PDF which doesn’t seem to have any text in it, such as scanned pages, using the open-source Tesseract engine. Scanned documents will be much slower to load — but you won’t be able to search them until you OCR them somewhere, so why not let Overview do it?
If you’d like to get OCR’d files out of Overview, you can simply export the documents after Overview has loaded them. You’ll get searchable PDFs back.
3. Customize the Word Cloud
You can use the delete tool to remove the words that aren’t adding anything. When you remove words, less common words are added to fill up the space. This way you can zero in on exactly what you want to investigate.
You can have more than one word cloud at a time, through the Add View menu. Press the Hidden Words button to unhide words.
4. The all-powerful Export
You can export all documents or just the result of the current search. For example, you could download only documents with the word “pizza” in them. And you can export either one document per file, or just the text (and any custom fields) as a CSV.
This means you can use Overview as a text extractor: upload random files, download a clean spreadsheet of the text. Or an OCR machine: upload random files, get searchable PDFs back.
5. Add custom data to each document
Overview now supports custom fields, or as we like to call it, document metadata.
You can add a field and set the value for all documents in a batch import.
Or you can edit the fields on one document at a time in the document viewer. If you add a field to one document, it will appear (initially blank) for all documents.
Or, if you load your documents via CSV, Overview will read in each extra column as a field.
Each field will be its own column when you export as a spreadsheet.