A quickstart guide to Overton’s JSON API
(Note: for large projects or where you need to get at a lot of data quickly the data snapshots may be a better option.)
The data in Overton is available in a machine readable format using our REST API, which sits in-between our database and the web application.
The API returns JSON and requires an API key to use. Note that API access isn’t enabled by default on accounts; please reach out if you’d like to try it.
You can see if API is enabled by hovering over “Export” in the grey action bar above your search results. If you see the option “Generate API call” then API access has been enabled. If the option is not there, contact support@overton.io to request API access.
To get the API call for the page you’re currently viewing in the app, click on “Generate API call.”
Best practice and return codes
Please call the API no more than once a second. Rate limiting is enforced but with leeway: if you accidentally go faster than this for a little while it’s fine but if you push up against the limits too frequently then your API key may be blocked automatically.
If you are rate limited then you’ll see a 429 HTTP status code and empty results come back from the API.
Interpreting results
Search API results are generally broken up into three sections:
The query
The query object tells you the number of pages that can be returned for the search (note that you may have a page limit on your account).
To select a page use the &page=x parameter, where x is a valid page number.
Facets
The Facets key contains roll-up information for various fields – this is what is displayed in the left hand sidebar on Overton’s search pages.
Results
The Results key contains the actual results of your query. The pdf_document_id is the unique key for the document: a single policy document may contain multiple PDFs when e.g. there’s an executive summary, or different language versions.
document_url is the landing page for the policy document (the web page that typically shows authors, an abstract etc.), while pdf_url is the link to the actual PDF.
For licensing reasons the API does not include full text content of PDFs: to obtain this you must use the pdf_url field and fetch and process the relevant PDF yourself. For some documents pdf_url is not present, or is the same as the document_url: these are policy documents that are only available in HTML and so data must be scraped from there.
Please do not hotlink to PDF thumbnails – the paths may change without notice.
Topics, entities and classifications (also referred to as subject areas) are covered in more detail in a separate help article.