5月27日 23:50

Elasticsearch 如何进行索引数据的迁移和重建？

Elasticsearch 索引迁移和重建是运维中绕不开的操作——无论是改 mapping、调分片数、换分词器，还是跨集群搬迁数据，都需要把旧索引的数据完整搬到新索引里。做不好就是数据丢失或者服务中断。

三种核心方案怎么选

方案	适用场景	停机要求	数据完整性
_reindex API	同集群内迁移、mapping 变更、分词器更换	可零停机	依赖验证
Snapshot & Restore	跨集群迁移、大版本升级	需短暂切换	高
_reindex + Pipeline	迁移同时需要字段转换	可零停机	依赖验证

选型原则：同集群内改结构用 _reindex，跨集群或版本升级用快照，迁移过程中要改数据格式就加 Pipeline。

_reindex API：同集群迁移的首选

基本用法

json
POST /_reindex
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index",
    "op_type": "create"
  },
  "conflicts": "proceed"
}

关键参数说明：

op_type: "create" —— 目标索引已存在相同 _id 的文档时跳过，而不是覆盖。原文档保留不动
conflicts: "proceed" —— 遇到版本冲突时跳过继续执行，不中断整个任务
requests_per_second —— 限流参数，防止 reindex 把集群压垮，生产环境建议设 10-50

加速：slices 并行

数据量大时，单线程 reindex 很慢。用 slices 参数按分片并行处理：

json
POST /_reindex?slices=5&refresh
{
  "source": {
    "index": "old_index"
  },
  "dest": {
    "index": "new_index"
  }
}

slices 设多少？等于源索引的分片数时性能最好。设太多反而增加调度开销。

零停机切换：别名机制

生产环境不能停服务，零停机的核心是别名切换：

json
// 第1步：创建新索引（新的 mapping）
PUT /new_index
{
  "mappings": { ... }
}

// 第2步：reindex 数据
POST /_reindex
{
  "source": { "index": "old_index" },
  "dest": { "index": "new_index" }
}

// 第3步：原子切换别名
POST /_aliases
{
  "actions": [
    { "remove": { "index": "old_index", "alias": "my_alias" } },
    { "add":    { "index": "new_index", "alias": "my_alias" } }
  ]
}

别名切换是原子操作，应用层无感知。切换后别忘了处理 reindex 期间的增量数据——可以在切换前用 refresh: "wait_for" 确保数据写入完毕。

远程集群 reindex

跨集群迁移不需要快照，_reindex 支持直接从远程集群拉数据：

json
POST /_reindex
{
  "source": {
    "remote": {
      "host": "http://old-cluster:9200",
      "username": "user",
      "password": "pass"
    },
    "index": "old_index",
    "query": {
      "match_all": {}
    }
  },
  "dest": {
    "index": "new_index"
  }
}

注意：远程 reindex 走 HTTP 拉数据，网络带宽是瓶颈。需要在 elasticsearch.yml 配置 reindex.remote.whitelist 允许远程主机。

Snapshot & Restore：跨集群和版本升级

快照方式保留完整的索引设置和映射，适合整体搬迁或大版本升级。

创建仓库和快照

json
// 注册快照仓库（S3 示例）
PUT /_snapshot/my_backup
{
  "type": "s3",
  "settings": {
    "bucket": "my-es-backups",
    "region": "us-east-1"
  }
}

// 创建快照
PUT /_snapshot/my_backup/snapshot_1
{
  "indices": "old_index",
  "ignore_unavailable": true,
  "include_global_state": false
}

include_global_state: false 很重要——不导出集群全局状态，避免覆盖目标集群的配置。

恢复到新索引

json
POST /_snapshot/my_backup/snapshot_1/_restore
{
  "indices": "old_index",
  "rename_pattern": "(.+)",
  "rename_replacement": "new_$1",
  "include_aliases": false
}

rename_pattern + rename_replacement 把旧索引名映射成新的，避免名称冲突。

版本兼容性

快照向前兼容一个大版本：7.x 的快照可以恢复到 8.x，但不能恢复到 9.x。跨多个大版本升级需要逐步中转。

_reindex + Pipeline：迁移同时改数据

需要迁移时顺便改字段结构，就用 Ingest Pipeline：

json
// 定义 Pipeline：把 old_field 的值复制到 new_field
PUT /_ingest/pipeline/transform_pipeline
{
  "description": "Transform fields during reindex",
  "processors": [
    {
      "rename": {
        "field": "old_field",
        "target_field": "new_field"
      }
    },
    {
      "remove": {
        "field": "deprecated_field"
      }
    }
  ]
}

// reindex 时指定 Pipeline
POST /_reindex
{
  "source": { "index": "old_index" },
  "dest": {
    "index": "new_index",
    "pipeline": "transform_pipeline"
  }
}

Pipeline 支持 rename、remove、set、script 等处理器，能处理大部分字段转换需求。

迁移后的验证清单

迁移完不代表万事大吉，以下验证缺一不可：

1. 文档数量校验

json
GET /new_index/_count

对比源索引和目标索引的文档数，必须一致。

2. 数据抽样比对

json
GET /new_index/_search
{
  "query": { "term": { "_id": "具体文档ID" } }
}

随机抽几条文档，逐字段对比 _source 内容。

3. 映射验证

json
GET /new_index/_mapping

确认新索引的 mapping 符合预期，特别是字段类型和分词器。

4. 性能验证

用实际的查询在迁移前后的索引上跑一遍，对比响应时间。新的分片数和 mapping 可能影响查询性能。

常见踩坑点

磁盘空间不足：reindex 期间新旧索引同时存在，磁盘占用翻倍。迁移前检查磁盘余量
refresh_policy 没关：大索引 reindex 时，把 refresh_policy 设为 none，完成后再手动 refresh，否则频繁刷新拖慢速度
超时中断：大索引 reindex 耗时很长，设置 timeout 和 scroll 参数（如 "scroll": "5m"），避免连接超时
mapping 不兼容：reindex 到新索引前必须先创建好目标索引的 mapping，否则 ES 自动推断的类型可能不对
跨集群白名单：远程 reindex 需要在目标集群配置 reindex.remote.whitelist，否则请求会被拒绝

迁移前在测试集群走一遍完整流程，记录每个步骤的耗时和资源消耗，再上生产。数据一致性是底线——跳过验证步骤的生产事故见得太多了。

标签：ElasticSearch