Describes how policy documents are linked to different SDGs
Please note: this functionality is still in beta. While we think the results are good enough to use in analyses note that the classifier does not ever assign SDGs 9, 12 or 17, and we’re still experimenting with ways to improve precision and recall.
As well as topics and subject areas Overton tries to map policy documents to one or more Sustainable Development Goals, which are a set of 17 goals set up by the United Nations to serve as a framework for global development. The SDGs are often used as a quick way to group policy or research relating to a particular problem area e.g. climate change, poverty or gender inequality.
Overton uses a machine learning system to determine if a policy document relates to an SDG.
We trained this system on the freely available OSDG dataset, which is a large set of text snippets that have been manually associated with related SDGs by multiple human volunteers (the OSDG dataset doesn’t currently contain examples for SDG 17 so we used our own supplemental dataset for that goal – but ultimately decided that it wasn’t sufficient, more on this below).
We first extract the first two or three pages of text from each policy document. In the case of HTML files (like blog posts) we extract the first hundred words. We then translate this to English, to match the OSDG training set.
Some SDGs are broader than others: while it’s usually clear if a document is about gender equality it can be less clear if a document is specifically about “Life on Land”. Overton sets a score threshold for each SDG that prioritizes accuracy but given that tries to balance precision (how often we’re correct when a classification is made) with recall (how many documents get a classification).
In testing we found that we couldn’t accurately and consistently classify three of the SDGs:
- SDG 9 – Industry, Innovation and Infrastructure
- SDG 12 – Responsible Consumption and Production
- SDG 17 – Partnerships to achieve the Goals
… so Overton’s classifier will not ever assign these SDGs to documents.
If you’d like to read more about the ODSG project please refer to their publications, which are free to read on arXiv:
- Pukelis, L., Puig, N. B., Skrynik, M., & Stanciauskas, V. (2020). OSDG–Open-Source Approach to Classify Text Data by UN Sustainable Development Goals (SDGs). arXiv preprint arXiv:2005.14569.
- Pukelis, L., Bautista-Puig, N., Statulevičiūtė, G., Stančiauskas, V., Dikmener, G., & Akylbekova, D. (2022). OSDG 2.0: a multilingual tool for classifying text data by UN Sustainable Development Goals (SDGs). arXiv preprint arXiv:2211.11252.