A brief description of how we find researcher names in the full text of policy documents
Overton tracks where it sees a researcher’s name in policy documents, in addition to processing any citations to other policy or scholarly outputs.
We call these “people mentions”.
Finding people mentions in a policy document is a three stage process.
Stage one – finding institution names
First we search for the names and name variants (e.g. “University College London”, “UCL”) of all the academic research institutions that we know about.
We record the document, page and paragraph of where we saw each institution name appear.
Lots of institutions have very similar names (e.g. the University of Washington, Washington University) and most short acronyms appear more than once (e.g. LSE matches both the London and Lahore School of Economics). Overton doesn’t try to guess which option is the right one, instead it puts all possible matches forward to the second stage.
Stage two – finding researcher names
For each institution found we then build a dictionary of possible researcher names, using affiliation metadata from the journal articles and books we have in the database (i.e. that have been cited at least once in policy).
We take some name variants into account. Specifically:
- If an author name has a full first name and initials we’ll look for a version without the initial(s) (e.g. given “Alice J Smith” we’ll also match “Alice Smith”
- We don’t do this when we only have an initial for the author’s first name
- If an author name has a double-barreled surname we’ll look for versions with and without a hyphen between the two parts
We then look for any of these researcher names in the paragraph(s) where we saw the relevant institution name, and in the paragraph immediately following it.
Any institution / researcher from that institution pairs found move onto the third stage.
Overton doesn’t disambiguate between researchers with the same name (Alice A Smith, Alice B Smith) at the same institution: if we saw “Alice Smith” in a policy document we’d map it to both researchers. Generally we try to balance precision – mapping a mention to exactly one, correct person – and recall – finding everywhere that one person is mentioned. Finding names without middle initials reduces precision but improves recall: it finds many, many more mentions for most researchers.
We currently build our dictionary from journal articles and books that have been cited at least once in policy. This means that if a researcher has been engaged in policy but never had any of their works actually cited we won’t be looking for their name, so they won’t ever have any people mentions. We’re working on other approaches that won’t have this limitation.
Stage three – sanity checking
At this point Overton has what looks like a people mention but needs to verify it. It runs the relevant paragraph through a set of heuristics to make sure:
- The matched institution name is the institution itself and not a similarly named organization – e.g. Cambridge University Press
- The paragraph where we found the person mention isn’t actually a reference of some kind
- There aren’t too many combinations of people names and affiliations in the relevant text, which might lead to false positives
Browsing and searching people mentions
Any researcher name / affiliation pairs passing stage three are saved in the database. You can find them on individual policy document pages, when searching for policy “linked to” a particular institution and on the People tab, where there’s an option on the left hand sidebar to see only mentions instead of both citations and mentions.
What to do if you think we’ve missed people mentions
Please contact support if you’re aware of documents where you or researchers at your institution are mentioned but we haven’t picked them up – we’re actively working on improving people mentions and the more examples of edge cases we have the better.