Documents in different languages

What languages does Overton support?

Overton is largely language agnostic

Overton is largely language agnostic (with some caveats – see below) and documents are indexed, analyzed and made available for search regardless of the language or alphabet that they’re written in.

To make browsing easier, document titles (data in the title field) are translated into English in the web application and API.

That said there are three main caveats.

Quality of reference extraction when citing local policy sources

Overton breaks up documents into paragraphs and analyzes each in turn, using a heuristic based approach to decide if it contains a valid reference.

There are dozens of different heuristics but some of them rely on matching keywords, identifying dates or otherwise spotting common referencing conventions.

These heuristics work best on Western style references so it’s possible that Overton will miss some references in non-Western documents if they are citing local sources (esp. Chinese, Japanese & Arabic). 

Translated references

When linking references to scholarly articles Overton often needs at least part of the article title to confirm a match.

This can be a problem when article titles get translated e.g. if a policy document is in Japanese and the authors have also translated the title of any cited English language scholarly articles into Japanese. That Japanese article title won’t match any scholarly articles in our publications database and so is ignored.

In these cases Overton will miss matches that can’t be confirmed by other means (like a DOI or a link to the publisher website).

Topics and classification

Overton uses machine learning techniques to identify the key topics in each document and to assign documents broad categories (“Health”, “Education”, “Crime, Law and Justice” etc.). 

The algorithms we use support these languages:

  • English
  • Chinese
  • Dutch
  • French
  • German
  • Italian
  • Japanese
  • Polish
  • Portuguese
  • Russian
  • Spanish
  • Swedish
  • Finnish
  • Danish
  • Norwegian
Updated on February 7, 2024

Was this article helpful?

Related Articles