How To Handle User Query Terms Correctly#

New in version 3.5.3.

The Query Term Matching Strategy specifies how Search Terms are interpreted and translated into Elasticsearch query clauses.

Note

See how Query Processing helps to handle Natural Language Queries properly by performing additional NLP tasks before querying Elasticsearch.

Tune-able Parameters

  • Searchable Field and Label boosting: Tune impact of terms that match in searchable fields

  • How to match terms:

    • Phrases: Apply Exact or more lenient matching? Additional boosting compared to normal term matches?

    • Sequence of Terms (Natural Language Query): How many terms have to match? Apply additional rescoring on exact matches?

Default Field Boosts#

Project Configuration topic.search.field-boosts

Define the impact that searchable fields & labels have to the overall document relevancy score.

"default_value": {
        "title": 2,
        "body": 1,
        "nlp_tag__phrases": 1
    }

With title and body being the default searchable fields. Here nlp_tag__phrases refers to the label that contains key-phrases added by the NLP Keyphrase Tagger.

Default Term Matching Configuration#

Project Configuration topic.search.query-strategy

Term Sequence Match#

Example query: capital of Switzerland

"term_sequence": {
    "operator": "OR",
    "minimum_should_match": "3<75% 7<5",
    "tie_breaker": 0.5
}
operator

Default boolean logic used to interpret a term sequence in the query string if no operators are specified.

Valid values are:

  • OR (Default): Example query is interpreted as capital OR of OR Switzerland. This is the preferred setting as it allows the usage of minimum_should_match conditions.

  • AND: Example query is interpreted as capital AND of AND Switzerland. This performs strict keyword matching and bypasses configured minimum_should_match conditions.

Type: string

minimum_should_match

Specifies how many terms of the query have to match on the content.

The official syntax allows a combination of multiple rules that are separated by whitespace, for example the setting 3<75% 7<5 means:

  • 0 - 3 tokens: all tokens have to match

  • 3 - 7 tokens: 75% of tokens have to match

  • > 7 tokens: at least 5 tokens have to match

Note: Requires operator set to OR

Type: number (int) or string (multiple conditions, percentages)

tie_breaker

Specifies how the total score of matches on multiple fields are combined. Field with highest score counts most.

Type: number (float)

Phrase Match#

Example query: “global warming”

"phrase": {
    "phrase_slop": 0,
    "boost": 2,
    "handle_synonyms": false,
    "include_stemmed_fields": false
}
phrase_slop

The phrase_slop configures how exact the phrase query is matched (exact vs lenient).

Example settings:

  • 0: Exact phrase has to match in order.

    Matches "global warming" only

  • 1: Allows one transposition.

    Matches "warming global"

  • 10: Allows up to 10 non-matching terms in to intervene the phrase in any order (proximity search).

    Matches "the warming climate has a negative effect on the global economy"

Note: Exact matches have higher relevance compared to lenient matches (applies for settings with phrase_slop > 0).

Type: number (int)

boost

To boost phrase matches higher then normal search terms. Phrases can also get automatically created from natural language queries during the query-rewriting stage (Query Processing)

Type: number (float)

handle_synonyms

Use configured synonyms on the project for phrase matches. Defaults to false

Type: boolean

include_stemmed_fields

If the phrase match should also search within the stemmed version of body and title. Per default false to return exact matches only. A query "BP" should not match bps.

Type: boolean

Multi Word Rescoring#

The general query applies the configured term_sequence configuration. But documents that contain the query terms within a single sentence or paragraph are not boosted per default. To improve this rescoring functions can be used.

Example query: global warming effects

Rescoring applies additional - potentially more expensive - scoring function to the top N ranked documents (by the more general query) and resorts them afterwards.

"rescore": {
    "on_term_sequences": {
        "enabled": true,
        "score_word_sequence_slop": 2,
        "score_word_sequence_items": 100,
        "score_word_query_weight": 0.7,
        "score_word_rescore_query_weight": 1.2,
        "score_word_score_mode": "total"
    }
}
enabled

Use rescoring on term sequences. Defaults to true

Type: boolean

score_word_sequence_slop

How close to each other the words in sequence scoring have to be for them to influence the scoring. A slop of 0 means the word have to be next to each other in the same order. Transposed terms have a slop of 2.

Type: number (int)

score_word_sequence_items

On how many items the word sequence scoring is applied. Defaults to 100

Type: number (int)

score_word_query_weight

Weight of the original query when rescoring items for score_word_sequence. Defaults to 0.7

Type: number (float)

score_word_rescore_query_weight

Weight of the rescoring query when rescoring items for score_word_sequence. Defaults to 1.2

Type: number (float)

score_word_score_mode

How to combine the original score and the rescore score for score_word_sequence. Possible values are: total, multiply, avg, max, min. Defaults to total

Type: string