In Elasticsearch, Near Real-Time (NRT) indexing means that data becomes searchable within seconds after indexing. Elasticsearch achieves this functionality primarily through the following key technologies:
-
Lucene Library Usage: Elasticsearch is built on top of Lucene, a high-performance text search engine library. One of Lucene's key features is its Inverted Index structure, which enables extremely fast text search. When documents are indexed in Elasticsearch, they are first stored in a memory region called "buffer". Once this buffer is full, Elasticsearch converts its contents into a structure called "segment" and writes it to disk.
-
Segment Refresh Mechanism: Segments are immutable, meaning their content cannot be modified once written to disk. To make newly indexed documents searchable, Elasticsearch periodically executes a process called "refresh"—typically once per second. During refresh, new segments are opened for search while previously opened segments remain available. This allows newly indexed documents to become searchable almost immediately, achieving the Near Real-Time effect.
-
Translog (Transaction Log): To ensure data persistence and consistency, Elasticsearch writes a transaction log called Translog before indexing documents into segments. In the event of a system crash, Translog can recover documents that have been indexed but not yet refreshed to segments. By default, when a segment is refreshed to disk, Translog is cleared.
By combining these mechanisms, Elasticsearch ensures data is indexed quickly and becomes searchable almost in real-time, providing efficient and reliable search services. This Near Real-Time indexing and search capability is one of the reasons why Elasticsearch is highly popular in log analysis, full-text search, and other scenarios.