Panorama Log Collector VM Cluster in 'Yellow' Status.

Paul_Stinson · ‎09-12-2023

We had some issues with licensing for one of the nodes.

We have since rectified this issue (switching to Panorama mode changed serial, licensed and changed back to logger mode).

On initial bootup Panorama was reporting log collectors connected and in config sync.

However there was a message around 'inter-lc-connectivity' not working.

We rebooted both log collectors within say 30secs of each other (as this has fixed this particular issue in the past).

On bootup I could see that Panorama was reporting "connected to all LC's in group" now with no issues reported.

So Panorama looks pretty good and you would think no issues from this side.

**Edit: checking log collector detail shows a low number on amount of detailed storage "9days" etc which doesn't marry up with the list of "show log-collector-es-indices" which shows some traffic indices as 20Gb in size.

Maybe the system is still doing something in the backend or we have instead some orphaned indices taking up space (any ideas on this welcome).

However checking the log collectors themselves I saw issues.

When performing "show log-collector-es-cluster health" it moved from red to yellow as it checked local logs and 'active-shards....' increased until it hit around 99% mark.

It seems to be staying on yellow though I suspect due to 4 shards in an 'unassigned state' and 'active_shards_percent....' never reaches 100%.

Any ideas on whether there are any commands we can run or steps or process to deal with "unassigned shards' in Palo Log Collectors?

I see Elasticsearch doco has commands that they use to deal with unassigned shards but what can we do on palo firewalls to either clear these shards or get them re-integrated?

many thanks for your advice.

I'll log a ticket with tac if there are no ideas on this.

Patrick1C · ‎01-29-2024

we see these and always have to open a ticket with TS. With them, we drop to root and run some command to re-process the unassigned shards.

View solution in original post

Paul_Stinson · ‎09-12-2023

I have since found one of the commands shows more info around unassigned.

      ".pancache" : {
        "shards" : {
          "2" : [
            {
              "state" : "STARTED",
              "primary" : true,
              "node" : "Kpu5WVqMSLSQWzCSV42-dg",
              "relocating_node" : null,
              "shard" : 2,
              "index" : ".pancache",
              "allocation_id" : {
                "id" : "Z5XfmB_CRNavbcLzpT2Xlw"
              }
            },
            {
              "state" : "UNASSIGNED",
              "primary" : false,
              "node" : null,
              "relocating_node" : null,
              "shard" : 2,
              "index" : ".pancache",
              "recovery_source" : {
                "type" : "PEER"
              },
              "unassigned_info" : {
                "reason" : "CLUSTER_RECOVERED",
                "at" : "2023-09-13T05:18:17.409Z",
                "delayed" : false,
                "allocation_status" : "no_attempt"
              }
            }
          ],
          "1" : [
            {
              "state" : "STARTED",
              "primary" : true,
              "node" : "Kpu5WVqMSLSQWzCSV42-dg",
              "relocating_node" : null,
              "shard" : 1,
              "index" : ".pancache",
              "allocation_id" : {
                "id" : "cZkPb8isTiSH-H4hHo8ojw"
              }
            },
            {
              "state" : "UNASSIGNED",
              "primary" : false,
              "node" : null,
              "relocating_node" : null,
              "shard" : 1,
              "index" : ".pancache",
              "recovery_source" : {
                "type" : "PEER"
              },
              "unassigned_info" : {
                "reason" : "CLUSTER_RECOVERED",
                "at" : "2023-09-13T05:18:17.409Z",
                "delayed" : false,
                "allocation_status" : "no_attempt"
              }
            }
          ],
          "3" : [
            {
              "state" : "STARTED",
              "primary" : true,
              "node" : "Kpu5WVqMSLSQWzCSV42-dg",
              "relocating_node" : null,
              "shard" : 3,
              "index" : ".pancache",
              "allocation_id" : {
                "id" : "4BBrampGQL--XQbqQYiUCQ"
              }
            },
            {
              "state" : "UNASSIGNED",
              "primary" : false,
              "node" : null,
              "relocating_node" : null,
              "shard" : 3,
              "index" : ".pancache",
              "recovery_source" : {
                "type" : "PEER"
              },
              "unassigned_info" : {
                "reason" : "CLUSTER_RECOVERED",
                "at" : "2023-09-13T05:18:17.409Z",
                "delayed" : false,
                "allocation_status" : "no_attempt"
              }
            }
          ],

Paul_Stinson · ‎12-05-2023

Also seeing that the particular indices = '.pancache'

yellow open .pancache 4 1 278039 8221 2.7gb 2.7gb

could anyone from Palo Alto advise on what to do to clear the shard or get system to re-integrate it (or anyone else that has solved this issue) using the limited commands palo has for Elasticsearch?

Paul_Stinson · ‎12-05-2023

I see all this info with following command around this indice

show log-collector-es-cluster state routing_table

".pancache" : {
"shards" : {
"2" : [
{
"state" : "STARTED",
"primary" : true,
"node" : "Kpu5WVqMSLSQWzCSV42-dg",
"relocating_node" : null,
"shard" : 2,
"index" : ".pancache",
"allocation_id" : {
"id" : "Z5XfmB_CRNavbcLzpT2Xlw"
}
},
{
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 2,
"index" : ".pancache",
"recovery_source" : {
"type" : "PEER"
},
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2023-11-13T13:34:44.153Z",
"delayed" : false,
"allocation_status" : "no_attempt"
}
}
],
"1" : [
{
"state" : "STARTED",
"primary" : true,
"node" : "Kpu5WVqMSLSQWzCSV42-dg",
"relocating_node" : null,
"shard" : 1,
"index" : ".pancache",
"allocation_id" : {
"id" : "cZkPb8isTiSH-H4hHo8ojw"
}
},
{
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 1,
"index" : ".pancache",
"recovery_source" : {
"type" : "PEER"
},
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2023-11-13T13:34:44.153Z",
"delayed" : false,
"allocation_status" : "no_attempt"
}
}
],
"3" : [
{
"state" : "STARTED",
"primary" : true,
"node" : "Kpu5WVqMSLSQWzCSV42-dg",
"relocating_node" : null,
"shard" : 3,
"index" : ".pancache",
"allocation_id" : {
"id" : "4BBrampGQL--XQbqQYiUCQ"
}
},
{
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 3,
"index" : ".pancache",
"recovery_source" : {
"type" : "PEER"
},
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2023-11-13T13:34:44.153Z",
"delayed" : false,
"allocation_status" : "no_attempt"
}
}
],
"0" : [
{
"state" : "STARTED",
"primary" : true,
"node" : "Kpu5WVqMSLSQWzCSV42-dg",
"relocating_node" : null,
"shard" : 0,
"index" : ".pancache",
"allocation_id" : {
"id" : "480BwuhvRKu4C1Wm_lmXzw"
}
},
{
"state" : "UNASSIGNED",
"primary" : false,
"node" : null,
"relocating_node" : null,
"shard" : 0,
"index" : ".pancache",
"recovery_source" : {
"type" : "PEER"
},
"unassigned_info" : {
"reason" : "CLUSTER_RECOVERED",
"at" : "2023-11-13T13:34:44.153Z",
"delayed" : false,
"allocation_status" : "no_attempt"
}
}
]
}

Paul_Stinson · ‎12-05-2023

from what i can tell these shards are just orphaned and the cluster has recovered and will not do anything to clear them. Can we delete the shard '.pancache'? if so how?

Patrick1C · ‎01-29-2024

we see these and always have to open a ticket with TS. With them, we drop to root and run some command to re-process the unassigned shards.

PavelK · ‎02-07-2024

Hello @Paul_Stinson

I came across this newly published KB: How to fix Elasticsearch unassigned shards in Panorama Log Collector running 11.0. This might help you with unassigned shards.

Kind Regards

Pavel

Help the community: Like helpful comments and mark solutions.

Paul_Stinson · ‎02-07-2024

that would be useful if we were on version 11 ( not just yet but probably soon).

I got original issue fixed by logging tac case and a root engineer roped into to get root access to fix es cluster unassigned shards.

It must be a common issue to introduce new commands in 11 to fix this.

We just recently updated 10.2 release and issue has just reoccured...... 😞

the common 'shard' that appears to have issues is a . (dot) file so hidden i gather and appears to be some sort of temp file i assume.

it also appears the 'unassigned' reason appears to be something to do with a 'CLUSTER_RECOVERED" with NO attempt to allocate???

so off to log another case with Palo for same issue all caused by simple upgrade to panorama and the two log collectors.

Panorama Log Collector VM Cluster in 'Yellow' Status.