Discover Overton’s criteria for collecting documents from an indexed source
Overton has a broad definition of a policy document:
“documents written primarily for or by policymakers”
… and collects from a wide range of different sources. But we don’t collect everything from these sources: we manually curate each one and set up filters to include only those publications that we think users of Overton will be interested in.
What we can’t / don’t collect
- Anything behind a login or paywall, or where you have to enter an email address before downloading the document
- Journal articles, conference abstracts or other works clearly in the scholarly record
- Books, unless publicly available and hosted on the organization’s website
- News articles or blog posts, if they contain mostly news or announcements – but see below for exceptions
- Reprints of commentary in other magazines or newspapers
- Statistical tables
- Primary legislation – we don’t explicitly exclude this but don’t go out of our way to track it
- Court cases or legal briefs – we don’t explicitly exclude this but don’t go out of our way to track it
- Interactive mini-websites, where there is no downloadable version of the document available
- Archived publications, if these are on a different website (e.g the Internet Archive)
What we will sometimes collect
- Blog posts, if they are a venue for commentary and opinion on policy related issues – this is a relatively new feature and Overton is still finding and adding blog URLs for its sources
- Publications where there is no downloadable version – in some cases Overton can work with the contents of a webpage, where the main text is clearly identifiable with semantic markup e.g. <article> and <header> tags are used
What we will always try to collect
- Working papers, reports, case studies, policy briefs, testimony, clinical guidelines and government documents
- Publications of interest to a policy audience that have a clear, publicly available link to a downloadable version
We’ve got some additional guidance for publishers for making your publications as visible as possible to Overton and other indexes including Google, Google Scholar and Bing.
Document types in Overton
In general we have two types of source:
- Sources that categorize their own publications by type (e.g. a think tank that has separate “Policy Brief” and “Annual Reports” sections) or whose publications are all instances of a particular type (e.g. parliamentary transcripts). These account for about 70% of the database.
- Sources whose documents are not organized by type (e.g. some city, state or federal government departments, where publications are grouped by theme rather than type). These account for the remaining 30% of the database.
When we add a type 1 source we manually map their categories to our own as far as possible. We can’t currently do this for type 2 sources.
There are thousands of source specific categories, but Overton groups the most common ones so you can filter on them inside the application.
You can see these in the Document Type filter when browsing or searching policy documents:
… the original, source category is preserved in the API output if required. Please let us know if we’re missing a document type you’d find useful to filter on.