When implementing search-as-you-type functionality, the edge_ngram tokenizer is a commonly used method in Elasticsearch that provides real-time autocomplete suggestions as users type. Below, I will explain in detail how the edge_ngram tokenizer works and how to use it to implement search-as-you-type functionality.
What is the edge_ngram tokenizer?
The edge_ngram tokenizer is a tokenizer used during indexing to generate n-grams starting from the edges of words. For example, for the word 'Apple', using the edge_ngram tokenizer with a minimum length of 1 and maximum length of 5, it generates the following n-grams: ['A', 'Ap', 'App', 'Appl', 'Apple'].
Implementation Steps:
-
Define Index Settings: In Elasticsearch, you must first define an index and configure it to use the
edge_ngramtokenizer. This requires setting up a custom analyzer in the index settings that includes theedge_ngramtokenizer.jsonPUT /products { "settings": { "analysis": { "analyzer": { "autocomplete": { "type": "custom", "tokenizer": "autocomplete", "filter": ["lowercase"] } }, "tokenizer": { "autocomplete": { "type": "edge_ngram", "min_gram": 1, "max_gram": 10, "token_chars": ["letter", "digit"] } } } } } -
Map Fields to Use the Custom Analyzer: During index mapping, specify which fields should utilize this custom
autocompleteanalyzer.jsonPUT /products/_mapping { "properties": { "name": { "type": "text", "analyzer": "autocomplete", "search_analyzer": "standard" } } } -
Index Data: Index product data into this index. For instance, index a product named 'Apple iPhone'.
jsonPOST /products/_doc/1 { "name": "Apple iPhone" } -
Implement Search Query: As users begin typing a search term, use a simple
matchquery to retrieve matching records. Because the data has been processed withedge_ngram, partial inputs can still find relevant results.jsonGET /products/_search { "query": { "match": { "name": { "query": "app" } } } }
In this example, when users type 'app', the system can quickly return relevant products like 'Apple iPhone' because the index already contains n-grams from 'A' to 'Appl'.
In summary, using the edge_ngram tokenizer effectively provides fast and dynamic search suggestions as users type, enhancing user experience and optimizing the search process.