This talk is about how text analysis and natural language processing is being used in journalism, open government, and transparency generally.
The first part of the talk is a survey of existing public projects, and the algorithms behind them, including
- Churnalism detects plagiarism in the news (or press releases!)
- Many Bills automatically classifies the sections of bills, to detect pork barrel projects
- Docket Wrench analyzes the comments on proposed regulations
- NewsDiffs watches for changes in published articles
- FEC Standardizer automatically cleans campaign donor names
- MemeTracker tracks political quotes across the whole web, as they mutate
And of course, there’s a brief demonstration of Overview, and a discussion of the algorithms behind it. (First time here? See how Overview works for investigative journalists)
Finally, there’s a great a discussion of where data-driven transparency is going now — or, what should we work on next? How do we know we are working on the right data sets and the right tools? How can we evaluate the impact of transparency projects? The talk ends with a throwdown — the Transparency Grand Challenge!