Duplicate policy documents in Overton

While Overton takes steps to not index duplicate policy documents (you can find more information on how we disambiguate policy documents in this article), there are some situations in which duplicates could occur.

Why duplicates occur

There are two main reasons for duplicates in Overton

  • The same document appears more than once from the same source
    When the metadata from a policy document is slightly different from an existing Overton policy document, our code thinks that it is picking up something new. There are many variants of this situation and we are working on putting processes in place to reduce those occurrences.
  • The same document appears in more than one source
    Policy documents can be authored by multiple organisations. Each organisation may provide the document on their individual websites. Overton in turn picks up as individual policy documents.

    A policy document can also appear in multiple sources if the document is shared by both a policy author and an aggregator source. Aggregator policy sources index documents from many policy authors. The Publications Office of the European Union is a good examples of an aggregator source.

Why we don’t merge duplicates

When duplicates occur, we don’t actively try to merge them. The reason for this is because each document can provide qualitative information about the reach and visibility of teach individual document.

Managing duplicates

Once you have finished a search and are satisfied with your result set, you can export it either in .cvs or in an excel file using the grey bar above the search results. In the export file, duplicate values can be identified quickly by sorting the results first by the “Title” column and then by the “source name” column.

Updated on May 16, 2024

Was this article helpful?

Related Articles