Overton’s full-text search function is powered by Elasticsearch.
The default relevance scoring algorithm used by Elasticsearch is BM25, which is a version of the TF-IDF (Term Frequency – Inverse Document Frequency) model. TF-IDF gives higher relevancy scores to words that appear often in one document but are not common in other documents.
Term frequency (TF)— The more documents that contain a search term in the field that we are searching, the less important that term is or how often does the term appear in this document.
Inverse document frequency (IDF) — The more times that a search term appears in the field we are searching in a document, the more relevant that document is or how often does the term appear in all documents in the collection.
Overton’s boost settings
Users of Elasticsearch can also adjust the relevancy score by using boost settings. Boosts prioritise specific terms or fields within a document for where a search term can be found.
Our current relevancy score calculation is: _score = boost * idf * tf
Overton’s boost settings help us fine-tune our search results to what is particularly relevant for our users.
Our boost settings (These are current as of July 2024. We may be tweak the from time to time).
- Boost is 1 by default
- Boost is 20 if the match is in title or translated_title
- Boost is 10 if the match is in the snippet
- Boost is 3 if in PDF title
- Boost is 5 if match is in “other_identifiers” e.g. the source’s internal identifier
If you have further questions about our search or relevance scoring, please contact support@overton.io