How does Overton find citation contexts?

When Overton finds a reference to a scholarly output in a policy document we try to show you where in the text it is being used, like so:

Image showing how Overton displays references to papers found in policy documents

We do something very similar for citations to other policy documents.

We call these citation contexts.

Citation contexts in more detail

The citation itself is the bit at the bottom of the image above:

Example of a citation in a policy document

That’s the part that we’ve used to figure out which paper is being cited – in this case it’s one by Wiysonge et al in Human Vaccines Immunotherapeutics. We tell you where in the document we’ve found it – in this case page 5 – so that you can go check the PDF for yourself.

In this case the citation is part of a bibliography section at the end of the document and it’s numbered – we can see that it’s item 18.

The citation contexts are where that citation is used in the main body of the policy document:

Example citation contexts

In this case the citing policy document uses numbers in square brackets to refer to specific items in the bibliography. Overton understands that [18] means item 18, that [18,19] means items 18 and 19, [18 – 20] means items 18, 19 and 20 and so on.

Sometimes there aren’t any citation contexts

Citation contexts are only extracted when Overton finds a clear bibliography section in a policy document.

Often policy documents lack these: the citation is the context. For example many policy briefs will link directly to scholarly articles with hyperlinks in the main body of the text rather than splitting anything out at the end.

At the other extreme a document might just be one big bibliography, for example an appendix showing which papers were considered during a literature review.

It’s also not unusual for policy documents to have bibliography sections but then not refer to them in the text, or to forgo bibliographies in favour of numbered footnotes (which the citation context finder in Overton doesn’t currently support)

Supported referencing styles

The policy world uses a variety of different referencing styles (sometimes within the same document!). Overton looks at each document and tries to figure out what the best strategy to find citation contexts might be.

If we’re not finding citation contexts in a document it could be because we don’t yet support the referencing style being used within it.

We currently support:

Superscript numbers

Despite evidence to the contrary6-9

We draw on previous work1,5,12 and extend it with new cutting edge techniques2

Numbers in square brackets or parentheses

Despite evidence to the contrary [6-9]

We draw on previous work (1,5,12) and extend it with new cutting edge techniques (2)

Citation shortcodes in parentheses

Despite evidence to the contrary (Smith et al., Franklin)

We draw on previous work (Linz 2002, Rosalind et al 2020) and extend it (World Bank, 2018)

Narrative cues

These rely on us finding specific phrasing, usually used in table captions, quotes and footnotes.

Despite evidence to the contrary – see Smith 2019

Figure adapted from Linz et al

We don’t currently support:

Numbers in the text without brackets or parentheses

Despite evidence to the contrary 6-9 at least 12 people insisted

We draw on previous work 1,5,12 and extend it with 2 new cutting edge techniques 2

Humans are good at spotting which numbers are references and which aren’t in these examples because they can understand meaning and the context each number appears in, but Overton can’t: so we don’t try to find citation contexts when policy documents use this referencing style.

Matching item numbers or shortcodes to references

Once we’ve found a link between the text and an item in a bibliography we have to match them so that we can tell which scholarly output is actually being referred to.

Overton will only do this if it is certain the match is correct: it will ignore any citation contexts that are ambiguous.

This means that there are scenarios where citation contexts won’t be matched:

  • If the document is using a name shortcode (e.g. Smith et al) but there are multiple bibliography entries authored by Smith then that shortcode will be ignored
  • If the document is using a name and year shortcode (Smith 2020) but there are multiple bibliography entries authored by Smith and published in 2020 then that shortcode will be ignored. The exception to this is when letters are used after years: Overton supports e.g. Smith 2020a, Smith 2020b.
  • More subtly: Overton first maps bibliography entries to DOIs. Afterwards, when resolving author / year shortcodes to a cited scholarly paper, the shortcodes are matched against the “gold standard” metadata for that DOI in Crossref and not the year and author name in the reference string used in the paper (we do this to avoid a different set of errors that come from parsing reference strings into different parts). If the policy document author used the wrong year for a paper then it may not match. If the author names match then Overton is forgiving of a year or two in either direction, but beyond that it will ignore the relevant shortcode.
  • If a citing article is a journal article it may use superscript numbers to refer to author affiliations as well as bibliography entries. Overton is quite good at spotting and removing these but occasionally real citation contexts will be caught too.

Seen a document where you think contexts should have been found?

Let support at Overton know and we’ll take a closer look at it for you.

Updated on February 7, 2024

Was this article helpful?

Related Articles