OpenTitles is a browser addon that tracks changes to over forty news sites, such as nos.nl, nytimes.com and theguardian.com. This addon adds a button to the headlines on these sites, which when clicked, will show all recent changes to the title of this article. Additionally, OpenTitles is available as an API and as a daily database dump that may be used for research purposes.
OpenTitles is made by Floris de Bijl
OpenTitles relies on RSS feeds and persistent ID's in order to keep track of articles. Every few minutes a scraper will pull all the RSS feeds for every site and compare the titles in the RSS feed to the titles in the database.
While titles and URL's may change, the ID will generally persist between changes to an article. We can therefore use the ID in the RSS feed to match an article to an ID in the database.
This approach has a few limitations, most notably the refresh rate and retention of the RSS feeds. Some RSS feeds are generated on demand, which is the best-case scenario for OpenTitles. In the worst cases the feed is only refreshed every hour, so changes made to titles in that time may not be picked up by the scraper.
Furthermore, RSS feeds usually only contain a few dozen articles, so sites with a high throughput might have an article on the RSS feeds for a few hours at most. Any changes made to the title after that can't be tracked, as the scraper relies on the RSS feed for indexing new titles.
A more robust version of OpenTitles would still use the RSS feed for indexing articles, but manually visit those articles for a set period of time to check for new titles. This is vastly more complex than the current approach and my time is very limited, so a rewrite using this technique is not on the roadmap at this point.
The source code for OpenTitles is available on Github. The project is split into five repositories: this website, the scraper, the API server, the definition and the client (i.e. the browser addon).
All components are made with Typescript, with the exception of this website.
Every 24 hours (Central European Time) a new database dump is generated using mongoexport and made available through https://dump.opentitles.info/. This data is free to use for any purpose.
To be expanded
The entry point for the API is https://api.opentitles.info/v2/country
The path to an article is as follows
The OpenTitles API uses the following error codes:
|Not Found -- The specified resource could not be found. Formatted as JSON with an error property describing the error and a lookat property with the path to a list of this resource.
|Internal Error -- An error occured on the server side that could not be recovered from.
'error': 'No such country',