Describes how policy documents are linked to different SDGs
Please note: this functionality is still in beta. Not all languages are supported and the SDG classification is run periodically rather than as part of Overton’s main processing pipeline.
As well as topics and subject areas Overton tries to map policy documents to one or more Sustainable Development Goals, which are a set of 17 goals set up by the United Nations to serve as a framework for global development. The SDGs are often used as a quick way to group policy or research relating to a particular problem area e.g. climate change, poverty or gender inequality.
Overton uses a system based on keyword matching and thresholds to determine if a policy document relates to an SDG.
As a base we took the key words & phrases set compiled by Monash University & SDSN Australia/Pacific which is available here.
We first extract the first two or three pages of text from each policy document. In the case of HTML files (like blog posts) we extract the first hundred words.
We then look for phrases relating to each SDG and build up a relevance score for it. Matching phrases add:
- 4 to the score if they’re seen in the document title (this is translated into English)
- 3 to the score if they’re seen in the document “snippet” – the abstract usually found on the document landing page
- 1 otherwise
Phrases are found via a case insensitive match.
Overton considers some keywords “conditional” – these only contribute to a relevance score if at least one non-conditional keyword has also been found. We determined this manually as part of our curation process.
For example in SDG 2 (Zero Hunger) the Monash key words & phrases include both “malnourished” and “wasting”.
If a document contained only “wasting” in the text – and no other keywords from SDG 2 – then its relevance score for SDG 2 would be 0. Wasting could be related to any number of other things.
But if the document also contained “malnourished” in the text it would score 2 – 1 point from “malnourished” and another point from “wasting”.
Currently all SDGs have the same score threshold of 6. If a policy document has a relevance score over this threshold then it is considered relevant to that SDG.