2

We're using ElasticSearch completion suggester with the Standard Analyzer, but it seems like the text is not tokenized.

e.g.

Texts: "First Example", "Second Example"

Search: "Fi" returns "First Example"

While

Search: "Ex" doesn't return any result returns "First Example"

Guy Korland
  • 8,355
  • 13
  • 55
  • 102

2 Answers2

3

As the doc of Elastic about completion suggester: Completion Suggester

The completion suggester is a so-called prefix suggester.

So when you send a keyword, it will look for the prefix of your texts.

E.g:

Search: "Fi" => "First Example"

Search: "Sec" => "Second Example"

but if you give Elastic "Ex", it returns nothing because it cannot find a text which begins with "Ex".

You can try some others suggesters like: Term Suggester

Trong Lam Phan
  • 2,152
  • 3
  • 23
  • 46
1

A great work around is to tokenize the string yourself and put it in a separate tokens field. You can then use 2 suggestions in your suggest query to search both fields.

Example:

PUT /example
{
    "mappings": {
        "doc": {
            "properties": {
                "full": {
                    "type": "completion"
                },
                "tokens": {
                    "type": "completion"
                }
            }
        }
    }
}

POST /example/doc/_bulk
{ "index":{} }
{"full": {"input": "First Example"}, "tokens": {"input": ["First", "Example"]}}
{ "index":{} }
{"full": {"input": "Second Example"}, "tokens": {"input": ["Second", "Example"]}}

POST /example/_search
{
    "suggest": {
        "full-suggestion": {
            "prefix" : "Ex", 
            "completion" : { 
                "field" : "full",
                "fuzzy": true
            }
        },
        "token-suggestion": {
            "prefix": "Ex",
            "completion" : { 
                "field" : "tokens",
                "fuzzy": true
            }
        }
    }
}

Search result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 0,
    "max_score": 0,
    "hits": []
  },
  "suggest": {
    "full-suggestion": [
      {
        "text": "Ex",
        "offset": 0,
        "length": 2,
        "options": []
      }
    ],
    "token-suggestion": [
      {
        "text": "Ex",
        "offset": 0,
        "length": 2,
        "options": [
          {
            "text": "Example",
            "_index": "example",
            "_type": "doc",
            "_id": "Ikvk62ABd4o_n4U8G5yF",
            "_score": 2,
            "_source": {
              "full": {
                "input": "First Example"
              },
              "tokens": {
                "input": [
                  "First",
                  "Example"
                ]
              }
            }
          },
          {
            "text": "Example",
            "_index": "example",
            "_type": "doc",
            "_id": "I0vk62ABd4o_n4U8G5yF",
            "_score": 2,
            "_source": {
              "full": {
                "input": "Second Example"
              },
              "tokens": {
                "input": [
                  "Second",
                  "Example"
                ]
              }
            }
          }
        ]
      }
    ]
  }
}
M.Vanderlee
  • 2,469
  • 2
  • 15
  • 14
  • The manual tokenization is very important. You need to generate shingles (word-grams) to avoid the search from returning 0 results as soon as you begin to type the second word in a sentence. If you had a 3 word input such as "First Example Code", you wouldn't be able to return any results for "Example co", without shingles of the entire phrase. – Silas Hansen Apr 08 '21 at 13:45