Duplicate policy documents in Overton

Overton takes steps to not index duplicate policy documents (you can find more information on how we disambiguate policy documents in this article), however there are some situations in which duplicates occur.

Why duplicates occur

There are two main reasons for duplicates in Overton:

The same document appears more than once from the same source.
When the metadata of a new policy document slightly differs from an existing Overton document, our code mistakenly identifies it as unique.

The same document appears in more than one source.
Policy documents can be authored by multiple organisations. Each organisation may host the document on its own website, causing Overton to recognise them as separate policy documents.

A policy document can also show up in multiple sources if shared by both a policy author and an aggregator. Aggregator sources index documents from many policy authors.

Why we don’t merge duplicates

When duplicates occur, we don’t actively merge them. The reason for this is because every individual policy document will have its own ‘story’ in regards to it’s reach and visibility.

Managing duplicates

After completing a search, you can export the results in CSV or Excel format. To identify duplicate values, sort the export file by the “Title” column first and then by the “Source Name” column.

Updated on July 30, 2024

Was this article helpful?

Related Articles