Add fields during import

Now you can add custom fields to all documents at once while you’re importing.

This is a bit of a niche feature. It can help track the source of each document you import.

I’ll walk you through it.

First, add files as usual
Use the (new) “Fields” interface to specify fields for these documents.
Now, every document you uploaded has the field values you wrote.
You can specify other field values whenever you add more documents to the document set.
The original documents’ fields will have the original values. The new documents’ fields will have the new values.
Export the document set, and you’ll see the field values for all documents.

We hope this makes fields a little bit more useful.

Overview now does OCR!

We’ve added a new feature into Overview: Optical Character Recognition (OCR). That means you can upload scanned PDFs and Overview will automatically read the text from them.

Overview decides when to use OCR automatically, on every page that has fewer than 100 characters of searchable text. This will make your uploads a lot slower, but you will need to OCR them anyway before you can search them, and you can’t beat the convenience.

Overview uses Tesseract for OCR, because Tesseract is free. Sometimes Tesseract produces more garbage characters than other OCR engines, such as the one included in Adobe Acrobat Pro. If you’ve already OCR’d your documents using another program, Overview will just read the previously created text.

 

How Overview handles pesky Microsoft Excel

Overview has long had a fairly important (if little-used) feature: it can export all documents into a spreadsheet. Day one, we wrote the spreadsheet in “comma-separated values” (“CSV”) format.

Then we realized that Microsoft Excel couldn’t open all CSV files.

Day two, we implemented a separate export file type, just for Microsoft Excel. Here are the differences.

Continue reading How Overview handles pesky Microsoft Excel

Import, edit, and create document metadata

Have you ever needed to extract the author or write notes for each document? Now you can, with fields.

The new “Fields” section sits in wait underneath each document. Click it, and you’ll be able to create fields and change their values.

metadata-filled-in
The list of fields is the same for every document in a document set. Each document has its own field values.

You can create new fields directly in Overview, or import them as extra columns in your CSV (see: importing documents using a CSV file.)

All your fields will appear as new columns when you export a spreadsheet:

metadata-output-csv
You can use a spreadsheet program to filter for field values.

The “fields” feature is new. We know there’s plenty of pizzazz to add:

  • Right now, we only support single-line text fields: no dates, numbers, geo-coordinates, or so forth. As a workaround, format your text values carefully (e.g., use YYYY-MM-DD for dates) so your spreadsheet program can grok them.
  • Overview’s search feature doesn’t examine field values.
  • You cannot create fields or write to fields using the API. (You can read field values with the API, though: they’re in document.metadata.)
  • You can only set metadata on one document at a time.

Hi! Now we’re www.overviewdocs.com

We’re changing https://www.overviewproject.org to redirect to our new domain name, https://www.overviewdocs.com.

Why the change? Two reasons:

  1. Overview isn’t just an experimental “project” any more. Overview is a go-to tool for doc-crunching.
  2. Overview isn’t a non-profit “.org“. Overview Services Inc. is a commercial company. Don’t get us wrong — we’d love a donation as much as anybody. But our business strategy is to exchange services for money.

How does this affect you, our user? Well … uh … the text in your browser’s URL field will shrink by three characters. That’s about it.

Our automatic redirects will kick in on today, Monday, July 13, 2015, around noon. Don’t worry: you won’t lose any of your work, even if you’re using Overview while we switch.

Update, July 13, 2015: all done.

Overview’s Search Syntax

Overview supports phrase searches, fuzzy searches, and booleans. Here’s what you can search for in the search box and the Multi-Search plugin:

  • John Smith: All documents containing the phrase “John Smith“.  All the words, in order.
  • Pizza~: All documents matching the word “Pizza” or similar words such as “Piazza” or “Pizzas“. (“~” after a single word means fuzzy search. It can find documents that contain typos.)
  • John Smith AND Alice Smith: All documents containing both the phrase “John Smith” and the phrase “Alice Smith“. (“AND” means both phrases must appear.)
  • John Smith OR Alice Smith: All documents containing either the phrase “John Smith” or the phrase “Alice Smith” Or both phrases. (“OR” means any phrase must appear.)
  • John Smith AND NOT Alice Smith: All documents containing the phrase “John Smith” and not the phrase “Alice Smith“. (“NOT” means the phrase must not appear.)
  • Alice AND NOT (Bob OR Carol): All documents containing the phrase “Alice” and neither the phrase “Bob” nor the phrase “Carol“. (Parentheses help organize complicated queries.)
  • "John and Alice Smith": All documents containing the phrase “John and Alice Smith“. (Without quotation marks, it would have been interpreted as “(John) AND (Alice Smith)“. Quotation marks tell Overview to ignore operators such as AND, OR and NOT. You can use quotation marks or apostrophes.)
  • John Smith~2: All documents matching the phrase “John Smith” or phrases with the words John and Smith at most 2 words apart, such as “John 'The Culprit' Smith“. (“~N” after a multi-word phrase means proximity search.)
  • Smith*: All documents containing a word that begins with “Smith“, such as  “Smith“, “Smithy” or “Smithsonian“. (“*” after a phrase means prefix search.)
  • title:John Smith: All documents containing the phrase “John Smith” in their titles. The other way around is body:John Smith. By default, Overview searches every field.

In the coming months, we’ll be sitting with users to see how this new query language works for them. If you have any feedback about a particular query, please use the “Talk to Us” link at the top of Overview.