NAT-4-DEFAULT_MAX_ENTRIES: default maximum entries value 131072 exceeded; frame dropped

Issue: – NAT-4-DEFAULT_MAX_ENTRIES: default maximum entries value 131072 exceeded; frame dropped

Logs: –

%IOSXE-4-PLATFORM: SIP1: cpp_cp: QFP:0.0 Thread:000 TS:00041531217928775544 %NAT-4-DEFAULT_MAX_ENTRIES: default maximum entries value 131072 exceeded; frame dropped

%IOSXE-4-PLATFORM: SIP1: cpp_cp: QFP:0.0 Thread:000 TS:00041531226749139224 %NAT-4-DEFAULT_MAX_ENTRIES: default maximum entries value 131072 exceeded; frame dropped

%IOSXE-4-PLATFORM: SIP1: cpp_cp: QFP:0.0 Thread:000 TS:00041531234452426888 %NAT-4-DEFAULT_MAX_ENTRIES: default maximum entries value 131072 exceeded; frame dropped

Solution: –

increase the nat entry limit:

Config t

ip nat translation max-entries <number>

3650/3850 Output drops on interfaces

Issue details: – here we are focusing on G1/0/48

Port        Align-Err     FCS-Err    Xmit-Err     Rcv-Err  UnderSize  OutDiscards

Gi1/0/1             0           0           0           0          0            0

Gi1/0/2             0           0           0           0          0            0

Gi1/0/3             0           0           0           0          0            0

Gi1/0/4             0           0           0           0          0            0

Gi1/0/5             0           0           0           0          0    711746178

Gi1/0/6             0           0           0           0          0    358279966

Gi1/0/7             0           0           0           0          0   2144859618

Gi1/0/8             0           0           0           0          0   1379875758

.

.

Gi1/0/48            0           1           0           2          0      4750857

Switch # show int g1/0/48

GigabitEthernet1/0/48 is up, line protocol is up (connected: TDR running)

  Hardware is Gigabit Ethernet, address is 7c0e.ce7e.efb0 (bia 7c0e.ce7e.efb0)

  Description:

  MTU 1500 bytes, BW 100000 Kbit/sec, DLY 100 usec,

     reliability 255/255, txload 5/255, rxload 25/255

  Encapsulation ARPA, loopback not set

  Keepalive set (10 sec)

  Full-duplex, 100Mb/s, media type is 10/100/1000BaseTX

  input flow-control is on, output flow-control is unsupported

  ARP type: ARPA, ARP Timeout 04:00:00

  Last input never, output 00:00:00, output hang never

  Last clearing of “show interface” counters 1d04h

  Input queue: 0/2000/0/0 (size/max/drops/flushes); Total output drops: 4750857

Logs to Capture: –

16.x IOS XE version

show platform hardware fed switch 1 qos queue config interface gi 1/0/48

show platform hardware fed switch 1 qos queue stats interface gi 1/0/48

show platform hardware fed switch 1 qos dscp-cos counters interface gi 1/0/48

show int counters errors module 1

show int g1/0/48

3.x version:

show platform qos queue stats GigabitEthernet 1/0/48

sh platform qos  queue config GigabitEthernet 1/0/48

show int g1/0/48

show int counters errors module 1

show platform hardware fed switch 1 qos queue config interface gi/0/48

DATA Port:18 GPN:48 AFD:Disabled QoSMap:0 HW Queues: 144 – 151

  DrainFast:Disabled PortSoftStart:2 – 1080

———————————————————-

   DTS  Hardmax  Softmax   PortSMin  GlblSMin  PortStEnd

  —– ——–  ——–  ——–  ——–  ———

 0   1  5   120   2   480   6   320   0     0   4  1440

 1   1  4     0   6   720   3   480   2   180   4  1440

 2   1  4     0   5     0   5     0   0     0   4  1440

 3   1  4     0   5     0   5     0   0     0   4  1440

 4   1  4     0   5     0   5     0   0     0   4  1440

 5   1  4     0   5     0   5     0   0     0   4  1440

 6   1  4     0   5     0   5     0   0     0   4  1440

 7   1  4     0   5     0   5     0   0     0   4  1440

 Priority   Shaped/shared   weight  shaping_step

 ——–   ————-   ——  ————

 0      0     Shared            50           0

 1      0     Shared            75           0

 2      0     Shared         10000           0

 3      0     Shared         10000           0

 4      0     Shared         10000           0

 5      0     Shared         10000           0

 6      0     Shared         10000           0

 7      0     Shared         10000           0

   Weight0 Max_Th0 Min_Th0 Weigth1 Max_Th1 Min_Th1  Weight2 Max_Th2 Min_Th2

   ——- ——- ——- ——- ——- ——-  ——- ——- ——

 0       0     478       0       0     534       0       0     600       0

 1       0     573       0       0     641       0       0     720       0

 2       0       0       0       0       0       0       0       0       0

 3       0       0       0       0       0       0       0       0       0

 4       0       0       0       0       0       0       0       0       0

 5       0       0       0       0       0       0       0       0       0

 6       0       0       0       0       0       0       0       0       0

 7       0       0       0       0       0       0       0       0       0

SWITCH# show platform hardware fed switch 1 qos queue stats interface gi1/0/48

DATA Port:18 Enqueue Counters

——————————-                         

Queue Buffers Enqueue-TH0 Enqueue-TH1 Enqueue-TH2       

—– ——- ———– ———– ———–

    0       0           0  4977682796  3283396622

    1       0           0           0  8721427714

    2       0           0           0           0

    3       0           0           0           0

    4       0           0           0           0

    5       0           0           0           0

    6       0           0           0           0

    7       0           0           0           0

DATA Port:18 Drop Counters

——————————-                    

Queue Drop-TH0    Drop-TH1    Drop-TH2    SBufDrop    QebDrop

—– ———– ———– ———– ———– ———–

    0           0           0           0           0           0

    1           0           0    75741033           0           0

    2           0           0           0           0           0

    3           0           0           0           0           0

    4           0           0           0           0           0

    5           0           0           0           0           0

    6           0           0           0           0           0

    7           0           0           0           0           0

SOLUTION:

The drops are due to default QOS setting. We need to increase the Soft Buffer on the interface using the command

In 16.x version

Config t

qos queue-softmax-multiplier 1200

3.x Version

Refer the below Cisco URL.

https://www.cisco.com/c/en/us/support/docs/switches/catalyst-3850-series-switches/200594-Catalyst-3850-Troubleshooting-Output-dr.html

EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM

Issue: –  EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM

Details: –

What is parity error?

We know two types of parity errors:

– software parity errors – caused by an environmental disruption – this are more likely one time errors that are not appearing again on a device
– hardware parity errors – caused by a physical malfunction – if this is the caused they are repeating constantly

If you want to know more about parity errors please see below:
Background

What is a processor or memory parity error?

Parity checking is the storage of an extra binary digit (bit) in order to represent the parity (odd or even) of a small amount of computer data (typically one byte) while that data is stored in memory. The parity value calculated from the stored data is then compared to the final parity value. If these two values differ, this indicates a data error, and at least one bit must have been changed due to data corruption.

Within a computer system, electrical or magnetic interference from internal or external causes can cause a single bit of memory to spontaneously flip to the opposite state. This event makes the original data bits invalid and is known as a parity error.

Such memory errors, if undetected, may have undetectable and inconsequential results or may cause permanent corruption of stored data or a machine crash.

There are many causes of memory parity errors, which are classified as either soft parity errors or hard parity errors.

Soft Errors

Most parity errors are caused by electrostatic or magnetic-related environmental conditions.

The majority of single-event errors in memory chips are caused by background radiation (such as neutrons from cosmic rays), electromagnetic interference (EMI), or electrostatic discharge (ESD). These events may randomly change the electrical state of one or more memory cells or may interfere with the circuitry used to read and write memory cells.

Known as soft parity errors, these events are typically transient or random and usually occur once. Soft errors can be minor or severe:

Minor soft errors that can be corrected without component reset are single event upsets (SEUs).
Severe soft errors that require a component or system reset are single event latchups (SELs).

Soft errors are not caused by hardware malfunction; they are transient and infrequent, are mostly likely a SEU, and are caused by an environmental disruption of the memory data.

If you encounter soft parity errors, analyse recent environmental changes that have occurred at the location of the affected system. Common sources of ESD and EMI that may cause soft parity errors include:

– Power cables and supplies
– Power distribution units
– Universal power supplies
– Lighting systems
– Power generators
– Nuclear facilities (radiation)
– Solar flares (radiation)

Hard Errors

Other parity errors are caused by a physical malfunction of the memory hardware or by the circuitry used to read and write memory cells.

Hardware manufacturers take extensive measures to prevent and test for hardware defects. However, defects are still possible; for example, if any of the memory cells used to store data bits are malformed, they may be unable to hold a charge or may be more vulnerable to environmental conditions.

Similarly, while the memory itself may be operating normally, any physical or electrical damage to the circuitry used to read and write memory cells may also cause data bits to be changed during transfer, which results in a parity error.

Known as hard parity errors, these events are typically very frequent and repeated and occur whenever the affected memory or circuitry is used. The exact frequency depends on the extent of the malfunction and how frequently the damaged equipment is used.

Remember that hard parity errors are the result of a hardware malfunction and reoccur whenever the affected component is used.

If you encounter hard parity errors, analyze physical changes that have occurred at the location of the affected system. Common sources of hardware malfunction that may lead to hard parity errors include:

– Power surges (no ground)
– ESD
– Overheating or cooling
– Incorrect or partial installation
– Component incompatibility
– Manufacturing defect

How to identify the module:

%EARL-SW2_STBY-1-EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM  —> Standby SUP in Switch 2 (VSS)

EARL-DFC4-1-EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM   —-> Module 4

: %EARL-SW1_DFC2-1-EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM   —-> Module 2 in Switch 1

Solution: – Re-seat the affected module. Monitor the same for 48 hours. If error repeats, replace the module.

SATCTRL-FEX-4-SOHMS_DIAG_WARN

Issue:  SATCTRL-FEX108-4-SOHMS_DIAG_WARN  error on nexus 5k

Device affected

Nexus 2K

Issue details

%SATCTRL-FEX108-4-SOHMS_DIAG_WARN: FEX-108 Module 1: Runtime diag detected minor event: Correctable ECC errors <dev=0, count=1>

There seems to be an issue with a single bit stored in memory, but it gets corrected before it can cause any issues each time. It might be a hardware fault, though not one with any current impact, or it could be a transient issue. If it’s the latter, a reload might clear the issue. Otherwise, replacement of FEX is required. A single bit being off, especially if it gets corrected every time, shouldn’t have an impact on the system aside from the logs being generated. If it were to get worse and cause issues with multiple bits, however, then it wouldn’t be correctible, and a reload would occur on the FEX module.

RMA required cases – Multiple error

Sat xx  2 01:17:53 2017@498402 (112/212/0x0): FEX-111 Module 1: Runtime diag detected minor event: Correctable ECC errors <dev=0, count=3>

Sat xx  2 06:17:53 2017@513576 (112/212/0x0): FEX-111 Module 1: Runtime diag detected minor event: Correctable ECC errors <dev=0, count=1>

This error means that a single-bit ECC correction (error correction) was made on FEX memory. It is harmless because hardware was able to correct the memory error via ECC. FEX will not reboot and it is self-corrected. ECC is the memory protection in the fex, and it is corrected which means there was a problem, but the fex ECC compensated for it.

Action Plan: –

Replace the 2k if the logs are recurring.