Elastic search suddenly down

cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 
Announcements

Elastic search suddenly down

L2 Linker

Hi,

 

I am writing this to ask if anyone has experience with ES suddenly down? After restart only logs become normal. I need idea on what we can check to know the root cause of ES suddenly down.

6 REPLIES 6

Cyber Elite
Cyber Elite

Hello @LizaRajjab

 

if you mean by ES the Elastic Search in Panorama's log collector, then based on my past experience tracing the issue as well as troubleshooting is mostly for the TAC. What PAN-OS are you running? There have been several known issues in PAN-OS 10.1.X and 10.2.X.

 

Kind Regards

Pavel

Help the community: Like helpful comments and mark solutions.

Hi,

 

Panorama version is 10.2.9-h1.

Cyber Elite
Cyber Elite

Hello @LizaRajjab

 

thank you for reply.

 

You are already running as of now latest version of 10.2 which has all issues addressed in earlier versions.

 

When it comes to searching for a root cause of the last crash, I would review logs in below folders:

 

less mp-log ms.log (To review any generic issue)
less mp-log (Review any log file starting with "es_")
less es-log (Review log files if there any any generated)

 

Apart of this I would be looking into resource utilization of Panorama / Log Collector. If you do not get  anywhere, I would open a TAC ticket / Partner support ticket.

 

Kind Regards

Pavel

 

 

 

 

Help the community: Like helpful comments and mark solutions.

for ms.log as below. 

2024-05-29 19:12:16.401 +0700 File lock_reportgen deleted.
2024-05-29 19:12:16.401 +0700 ===================== MS: start ======================
2024-05-29 19:12:16.414 +0700 MS: SSL lib initialized
2024-05-29 19:12:16.414 +0700 Warning: pan_hash_init(pan_hash.c:113): nbuckets 2000 is not power of 2!
2024-05-29 19:12:16.414 +0700 Warning: pan_hash_init(pan_hash.c:113): nbuckets 2000 is not power of 2!
2024-05-29 19:12:16.414 +0700 Warning: pan_hash_init(pan_hash.c:113): nbuckets 2000 is not power of 2!
2024-05-29 19:12:16.414 +0700 MS: connection manager initialized
2024-05-29 19:12:16.416 +0700 sysd worker[0]: 7f1f26117700: starting up...
2024-05-29 19:12:16.519 +0700 Removing /tmp/.iddone in pan_cfg_remove_temporary_files
2024-05-29 19:12:16.529 +0700 Error: pan_dir_create(pan_fs.c:301): failed to create dir /tmp/pan wih error 17
2024-05-29 19:12:16.632 +0700 succeed to initialize xslt security preference
2024-05-29 19:12:16.632 +0700 sysd worker[0]: 7f1f25515700: starting up...
2024-05-29 19:12:16.632 +0700 sysd worker[1]: 7f1f25114700: starting up...
2024-05-29 19:12:16.632 +0700 sysd worker[3]: 7f1f24912700: starting up...
2024-05-29 19:12:16.632 +0700 sysd worker[2]: 7f1f24d13700: starting up...
2024-05-29 19:12:16.633 +0700 sysd worker[0]: 7f1f23d10700: starting up...
2024-05-29 19:12:16.633 +0700 Not connected to sysd yet. Sleeping for 5 second..
2024-05-29 19:12:18.417 +0700 Sysd Event: SUCCESS
2024-05-29 19:12:18.532 +0700 watching cms status change notifications...
2024-05-29 19:12:18.535 +0700 connmgr: inter-logger conn: Setting connections (017607003438), # of lc's = 1
2024-05-29 19:12:18.535 +0700 sc3_ca changed( -> e872fe75-97a7-4463-80fa-50e0c602c631)
CA CHANGE : File backup sucess ms.log.sc3cachange
2024-05-29 19:12:18.633 +0700 Sysd Event: SUCCESS
2024-05-29 19:12:18.633 +0700 connected to sysd
2024-05-29 19:12:18.633 +0700 config manager:connected to sysd
2024-05-29 19:12:18.635 +0700 Management server started. Running version 10.2.9-h1
2024-05-29 19:12:18.635 +0700 sw detail version 10.2.9
2024-05-29 19:12:18.635 +0700 pan_cfg_mgr_set_patch_version: Get patch version using swm info
2024-05-29 19:12:18.635 +0700 Warning: pan_log_proxy(pan_priv_log.c:269): Slog being proxied
2024-05-29 19:12:19.337 +0700 pan_cfg_mgr_set_patch_version: No installed patch version found
2024-05-29 19:12:19.339 +0700 <vsys> tag does not exist
2024-05-29 19:12:19.340 +0700 mgmt internal: client certificate profile commit
2024-05-29 19:12:19.340 +0700 No child nodes present under secure connection server mgmt settings, No updates needed.
2024-05-29 19:12:19.340 +0700 [secure_conn] extract secure_conn userid channel settings SERVER
2024-05-29 19:12:19.340 +0700 [secure_conn] user_id secure comm enabled for SERVER
2024-05-29 19:12:19.340 +0700 No child nodes present under secure connection client mgmt settings, No updates needed.
2024-05-29 19:12:19.340 +0700 [secure_conn] extract secure_conn userid channel settings CLIENT
2024-05-29 19:12:19.340 +0700 [secure_conn] user_id secure comm enabled for CLIENT
2024-05-29 19:12:19.340 +0700 Secure connection client info disabled
2024-05-29 19:12:19.340 +0700 Error: pan_cfg_get_system_resource_level(pan_cfg_utils.c:18981): Failed to fetch cfg.resource-level.override.memory from sysd
2024-05-29 19:12:19.340 +0700 system resource level: memory:level 3
2024-05-29 19:12:19.341 +0700 Initialized cfg mgr for management server
2024-05-29 19:12:19.443 +0700 SEATTLETIME: Time to 3: 3 secs
2024-05-29 19:12:19.443 +0700 MS: configuration manager initialized
2024-05-29 19:12:19.446 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:19.446 +0700 SC3: initialized
2024-05-29 19:12:19.449 +0700 <vsys> tag does not exist
2024-05-29 19:12:19.458 +0700 mgmt internal: client certificate profile commit
2024-05-29 19:12:19.459 +0700 DNS_API - init dns_vsys_disabled: FALSE
2024-05-29 19:12:19.459 +0700 Constructed event manager (addr=0x55951089c500)
2024-05-29 19:12:19.462 +0700 Notifier created for management server, (addr=0x559510c22380)
2024-05-29 19:12:19.462 +0700 Warning: pan_hash_init(pan_hash.c:113): nbuckets 10000 is not power of 2!
2024-05-29 19:12:19.462 +0700 created thread pool(0x559510cea6c0, 16)
2024-05-29 19:12:19.462 +0700 Error: create_worker_threads(threadpool.c:27): thread pool configures with zero threads!
2024-05-29 19:12:19.462 +0700 created thread pool(0x559510cea770, 0)
2024-05-29 19:12:19.462 +0700 Error: create_worker_threads(threadpool.c:27): thread pool configures with zero threads!
2024-05-29 19:12:19.462 +0700 created thread pool(0x559510cea820, 0)
2024-05-29 19:12:19.462 +0700 Non-blocking thread pool created for event manager, (addr=0x559510cea6c0)
2024-05-29 19:12:19.471 +0700 CMS: keyfile=/opt/pancfg/mgmt/cms/ssl_new/server.pem ppfile=/opt/pancfg/mgmt/cms/ssl_new/server.pp
2024-05-29 19:12:19.471 +0700 InterLogger: keyfile=/opt/pancfg/mgmt/cms/ssl_new/server.pem ppfile=/opt/pancfg/mgmt/cms/ssl_new/server.pp
2024-05-29 19:12:19.472 +0700 MS: panorama module initialized
2024-05-29 19:12:19.472 +0700 MS: event manager initialized
2024-05-29 19:12:19.475 +0700 MS: server address 7f000001 port:10000
2024-05-29 19:12:19.476 +0700 Setting 127.0.0.1 as a filter
2024-05-29 19:12:19.476 +0700 set TCP_NODELAY option on socket, port 10000
2024-05-29 19:12:19.476 +0700 Error: tp_submit_srvr_fd_work(socksrvr.c:118): work(SRVR, 0x559510c2a000) submitted
2024-05-29 19:12:19.476 +0700 The max requests per client is set to 250 for server 10000 (fd=21)
2024-05-29 19:12:19.481 +0700 Secure connection setting not enabled. Using default context.
2024-05-29 19:12:19.482 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:19.482 +0700 set TCP_NODELAY option on socket, port 3978
2024-05-29 19:12:19.482 +0700 Error: tp_submit_srvr_fd_work(socksrvr.c:118): work(SRVR, 0x559510c2d5a0) submitted
2024-05-29 19:12:19.482 +0700 The max requests per client is set to 50 for server 3978 (fd=16)
2024-05-29 19:12:19.482 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:19.482 +0700 set TCP_NODELAY option on socket, port 28270
2024-05-29 19:12:19.482 +0700 Error: tp_submit_srvr_fd_work(socksrvr.c:118): work(SRVR, 0x559510c2d020) submitted
2024-05-29 19:12:19.482 +0700 The max requests per client is set to 50 for server 28270 (fd=17)
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy configd: agent not connected, unable to broadcast to it
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy reportd: agent not connected, unable to broadcast to it
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy logd: agent not connected, unable to broadcast to it
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy logrcvr: agent not connected, unable to broadcast to it
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy cord: agent not connected, unable to broadcast to it
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy esmonitor: agent not connected, unable to broadcast to it
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy useridd: agent not connected, unable to broadcast to it
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy distributord: agent not connected, unable to broadcast to it
2024-05-29 19:12:19.908 +0700 Error: pan_evtmgr_proxy_broadcast_msg_to_srvcd(ms_evtmgr_proxy.c:562): Proxy iotd: agent not connected, unable to broadcast to it
2024-05-29 19:12:20.257 +0700 SC3: client presented SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:20.258 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:20.262 +0700 SC3: context initialized using SNI: e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:20.262 +0700 SC3: Server using SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:20.271 +0700 SC3: Cert-Verify (1) /CN=e872fe75-97a7-4463-80fa-50e0c602c631 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:20.271 +0700 SC3: using SC3 CA cert for validation
2024-05-29 19:12:20.272 +0700 SC3: Cert-Verify (0) /CN=726ff5db-4ea3-46d2-b48c-c470687371af/OU=027007001235 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:20.872 +0700 EM: Register request from distributord seq= 88
2024-05-29 19:12:20.872 +0700 Send registration response to distributord
2024-05-29 19:12:21.681 +0700 EM: Register request from esmonitor seq= 89
2024-05-29 19:12:21.681 +0700 Send registration response to esmonitor
2024-05-29 19:12:21.970 +0700 EM: Register request from iotd seq= 89
2024-05-29 19:12:21.970 +0700 Send registration response to iotd
2024-05-29 19:12:23.207 +0700 SC3: client presented SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:23.208 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:23.213 +0700 SC3: context initialized using SNI: e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:23.213 +0700 SC3: Server using SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:23.230 +0700 SC3: Cert-Verify (1) /CN=e872fe75-97a7-4463-80fa-50e0c602c631 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:23.230 +0700 SC3: using SC3 CA cert for validation
2024-05-29 19:12:23.232 +0700 SC3: Cert-Verify (0) /CN=40063227-552f-4e53-bddd-a71324d98d26/OU=013201036783 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:25.391 +0700 SC3: client presented SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:25.391 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:25.395 +0700 SC3: context initialized using SNI: e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:25.395 +0700 SC3: Server using SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:25.439 +0700 SC3: Cert-Verify (1) /CN=e872fe75-97a7-4463-80fa-50e0c602c631 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:25.439 +0700 SC3: using SC3 CA cert for validation
2024-05-29 19:12:25.442 +0700 SC3: Cert-Verify (0) /CN=3b946446-8cc6-4581-a9a3-88e5e069604f/OU=007957000415119 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:26.305 +0700 SC3: client presented SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:26.306 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:26.311 +0700 SC3: context initialized using SNI: e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:26.311 +0700 SC3: Server using SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:26.355 +0700 SC3: Cert-Verify (1) /CN=e872fe75-97a7-4463-80fa-50e0c602c631 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:26.355 +0700 SC3: using SC3 CA cert for validation
2024-05-29 19:12:26.357 +0700 SC3: Cert-Verify (0) /CN=5ee8edf5-5625-4ab4-bed4-823bb026c139/OU=007957000415115 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:26.671 +0700 EM: Register request from cord seq= 94
2024-05-29 19:12:26.671 +0700 Send registration response to cord
2024-05-29 19:12:26.925 +0700 SC3: client presented SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:26.926 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:26.931 +0700 SC3: context initialized using SNI: e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:26.931 +0700 SC3: Server using SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:26.974 +0700 SC3: Cert-Verify (1) /CN=e872fe75-97a7-4463-80fa-50e0c602c631 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:26.974 +0700 SC3: using SC3 CA cert for validation
2024-05-29 19:12:26.976 +0700 SC3: Cert-Verify (0) /CN=b54c0dba-f71f-4472-bd68-08befa3c6406/OU=007957000415117 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:27.200 +0700 SC3: client presented SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:27.201 +0700 SC3: CA: 'e872fe75-97a7-4463-80fa-50e0c602c631', CC/CSR: 'da24b72c-b715-4a7f-9a67-c0aedb535c4e'
2024-05-29 19:12:27.204 +0700 SC3: context initialized using SNI: e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:27.204 +0700 SC3: Server using SNI: 'e872fe75-97a7-4463-80fa-50e0c602c631'
2024-05-29 19:12:27.246 +0700 SC3: Cert-Verify (1) /CN=e872fe75-97a7-4463-80fa-50e0c602c631 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:27.246 +0700 SC3: using SC3 CA cert for validation
2024-05-29 19:12:27.249 +0700 SC3: Cert-Verify (0) /CN=1f729e8c-9c30-47b3-a079-adaa564c881d/OU=007957000415122 :: /CN=e872fe75-97a7-4463-80fa-50e0c602c631
2024-05-29 19:12:28.768 +0700 Error: pan_shm_alloc(pan_shm_alloc.c:55): failed to open shared memory:(errno: 2) No such file or directory
2024-05-29 19:12:28.768 +0700 Error: pan_contmgr_load_content(pan_contmgr.c:1043): pan_shm_alloc(size:32) failed
2024-05-29 19:12:28.768 +0700 Error: main(pan_logquery.c:1205): Failed to access shared content
2024-05-29 19:12:28.768 +0700 Warning: main(pan_logquery.c:1235): Loading content from disk
2024-05-29 19:12:38.579 +0700 Error: logquery_client_read(pan_dlc_logquery.c:198): failed, not ready
2024-05-29 19:12:38.579 +0700 Error: pan_issue_dlc_query(pan_dlc_logquery.c:599): Failed to read response from ms
2024-05-29 19:12:38.579 +0700 Error: pan_cms_dlc_logquery(pan_dlc_logquery.c:1394): failed to issue query to ms
2024-05-29 19:12:38.706 +0700 sc3cachange logs back up success
2024-05-29 19:12:38.706 +0700 Warning: pan_log_proxy(pan_priv_log.c:269): Slog being proxied
2024-05-29 19:12:38.706 +0700 [Secure conn cfg-mgr trigger update] Sec conn config not changed, No updates needed.
2024-05-29 19:12:38.708 +0700 connmgr: inter-logger conn: Setting connections (017607003438), # of lc's = 1
2024-05-29 19:12:38.708 +0700 sc3_ca changed(e872fe75-97a7-4463-80fa-50e0c602c631 -> e872fe75-97a7-4463-80fa-50e0c602c631)
CA CHANGE : File backup sucess ms.log.sc3cachange

 

thank you

Cyber Elite
Cyber Elite

Hello @LizaRajjab

 

thank you for reply.

 

From these logs I can't see anything indicating a root cause of the crash. These logs are capturing system start up. What happened prior to that is ether earlier in the log or was not for some reason recorded.

 

Kind Regards

Pavel

Help the community: Like helpful comments and mark solutions.

Hi,

 

i have open TAC case and tac reply as below

 

elasticsearch status is red and is having unassigned shards. The issue seems after pan-os upgrade, elasticsearch not restartedat that time.
2024-05-29 19:14:19.705 +0700 ELASTICSEARCH STATUS: active_primary_shards 0 active_shards 0 active_shards_percent 0
2024-05-29 19:14:19.705 +0700 ELASTICSEARCH STATUS: relocating_shards 0 initializing_shards 0 unassigned_shards 182 delayed_unassigned_shards 0 >>>>>>>>
..;
2024-06-10 16:58:31.162 +0700 ELASTICSEARCH STATUS: active_primary_shards 135 active_shards 135 active_shards_percent 68
2024-06-10 16:58:31.162 +0700 ELASTICSEARCH STATUS: relocating_shards 0 initializing_shards 32 unassigned_shards 29 delayed_unassigned_shards 0
To check why unassigned shards, we need check from TSF log file ''es_stats.txt'', access to root of the device, but currently, customer restart the elasticsearch to resolve the issue, we are unable to check the more error logs, process status to identify the root cause.

 

so cannot check root cause.

 

  • 1797 Views
  • 6 replies
  • 0 Likes
Like what you see?

Show your appreciation!

Click Like if a post is helpful to you or if you just want to show your support.

Click Accept as Solution to acknowledge that the answer to your question has been provided.

The button appears next to the replies on topics you’ve started. The member who gave the solution and all future visitors to this topic will appreciate it!

These simple actions take just seconds of your time, but go a long way in showing appreciation for community members and the LIVEcommunity as a whole!

The LIVEcommunity thanks you for your participation!