Note that for large projects or where you need to get at a lot of data quickly the data snapshots may be a better option.
The data in Overton is available in a machine readable format using our REST API, which sits in-between our database and the web application.
The API returns JSON and requires an API key to use. Note that API access isn't enabled by default on trial accounts; please reach out if you'd like to test it out.
You can see if the API is enabled and get your API key by checking the "Use x Results" button on any search results page for a "Generate API call" option.
If you select this your browser will be redirected to the API call for that search.
Best practice and return codes
Please call the API no more than once a second. Rate limiting is enforced but with leeway: if you accidentally go faster than this for a little while it's fine but if you push up against the limits too frequently then your API key may be blocked automatically.
If you are rate limited then you'll see a 429 HTTP status code and empty results come back from the API.
Search API results are generally broken up into three sections:
The query object tells you the number of pages that can be returned for the search (note that you may have a page limit on your account).
To select a page use the &page=x parameter, where x is a valid page number.
The Facets key contains roll-up information for various fields - this is what is displayed in the left hand sidebar on Overton's search pages.
The Results key contains the actual results of your query. The pdf_document_id is the unique key for the document: a single policy document may contain multiple PDFs when e.g. there's an executive summary, or different language versions.
document_url is the landing page for the policy document (the web page that typically shows authors, an abstract etc.), while pdf_url is the link to the actual PDF.
For licensing reasons the API does not include full text content of PDFs: to obtain this you must use the pdf_url field and fetch and process the relevant PDF yourself. For some documents pdf_url is not present, or is the same as the document_url: these are policy documents that are only available in HTML and so data must be scraped from there.
Please do not hotlink to PDF thumbnails - the paths may change without notice.
Topics, entities and classifications (also referred to as subject areas) are covered in more detail in a separate help article.