解决Elasticsearch集群 master_not_discovered_exception 异常

2024-11-08 来源：个人技术集锦

错误描述

查看集群健康返回以下错误：

{
	"error": {
		"root_cause": [{
			"type": "master_not_discovered_exception",
			"reason": null
		}],
		"type": "master_not_discovered_exception",
		"reason": null
	},
	"status": 503
}

我通过docker命令在三台机器上分别启动es应用后，单个节点可以通过网络访问，但是他们彼此之间却显示无法通信，导致选举失败，发现不了主节点。

问题排查

查看es日志发现：

java.net.NoRouteToHostException: No route to host (Host unreachable)

可能是防火墙的原因：

service iptables status #查看防火墙状态

如果没关的话使用以下命令：

systemctl stop firewalldsy
stemctl disable firewalld

关闭后发现还是不行，这里发下我发生错误时节点的es配置：

#跨域配置
http.cors.enabled: true
http.cors.allow-origin: "*"
#集群名称
cluster.name: elasticsearch-cluster
#节点名称
node.name: node-1
#是不是有资格竞选主节点
node.master: true
#是否存储数据
node.data: true
#最大集群节点数
node.max_local_storage_nodes: 3
#网络地址
network.host: 0.0.0.0
#端口
http.port: 9200
#内部节点之间沟通端口
transport.tcp.port: 9300
#es7.x 之后新增的配置，写入候选主节点的设备地址，在开启服务后可以被选为主节点
discovery.seed_hosts: ["192.168.2.90","192.168.2.91","192.168.2.92"]
#es7.x 之后新增的配置，初始化一个新的集群时需要此配置来选举master
cluster.initial_master_nodes: ["node-1", "node-2","node-3"]

直到我在加入了以下配置后成功解决了该问题：

#设置当前节点与其他节点交互的IP地址
network.publish_host: 192.168.2.90

正确结果

{
  "cluster_name" : "elasticsearch-cluster",
  "status" : "green",
  "timed_out" : false,
  "number_of_nodes" : 3,
  "number_of_data_nodes" : 3,
  "active_primary_shards" : 3,
  "active_shards" : 6,
  "relocating_shards" : 0,
  "initializing_shards" : 0,
  "unassigned_shards" : 0,
  "delayed_unassigned_shards" : 0,
  "number_of_pending_tasks" : 0,
  "number_of_in_flight_fetch" : 0,
  "task_max_waiting_in_queue_millis" : 0,
  "active_shards_percent_as_number" : 100.0
}

原理分析

network.bind_host设置允许控制不同的网络组件将绑定的主机。默认情况下，绑定主机将是anyLocalAddress（通常为0.0.0.0或者:: 0）。默认情况下network.host将设置network.bind_host和network.publish_host为相同的值。

显示全文

全部栏目

解决Elasticsearch集群 master_not_discovered_exception 异常

错误描述

问题排查

正确结果

原理分析