Issue: – EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM
We know two types of parity errors:
– software parity errors – caused by an environmental disruption – this are more likely one time errors that are not appearing again on a device
– hardware parity errors – caused by a physical malfunction – if this is the caused they are repeating constantly
If you want to know more about parity errors please see below:
What is a processor or memory parity error?
Parity checking is the storage of an extra binary digit (bit) in order to represent the parity (odd or even) of a small amount of computer data (typically one byte) while that data is stored in memory. The parity value calculated from the stored data is then compared to the final parity value. If these two values differ, this indicates a data error, and at least one bit must have been changed due to data corruption.
Within a computer system, electrical or magnetic interference from internal or external causes can cause a single bit of memory to spontaneously flip to the opposite state. This event makes the original data bits invalid and is known as a parity error.
Such memory errors, if undetected, may have undetectable and inconsequential results or may cause permanent corruption of stored data or a machine crash.
There are many causes of memory parity errors, which are classified as either soft parity errors or hard parity errors.
Most parity errors are caused by electrostatic or magnetic-related environmental conditions.
The majority of single-event errors in memory chips are caused by background radiation (such as neutrons from cosmic rays), electromagnetic interference (EMI), or electrostatic discharge (ESD). These events may randomly change the electrical state of one or more memory cells or may interfere with the circuitry used to read and write memory cells.
Known as soft parity errors, these events are typically transient or random and usually occur once. Soft errors can be minor or severe:
Minor soft errors that can be corrected without component reset are single event upsets (SEUs).
Severe soft errors that require a component or system reset are single event latchups (SELs).
Soft errors are not caused by hardware malfunction; they are transient and infrequent, are mostly likely a SEU, and are caused by an environmental disruption of the memory data.
If you encounter soft parity errors, analyse recent environmental changes that have occurred at the location of the affected system. Common sources of ESD and EMI that may cause soft parity errors include:
– Power cables and supplies
– Power distribution units
– Universal power supplies
– Lighting systems
– Power generators
– Nuclear facilities (radiation)
– Solar flares (radiation)
Other parity errors are caused by a physical malfunction of the memory hardware or by the circuitry used to read and write memory cells.
Hardware manufacturers take extensive measures to prevent and test for hardware defects. However, defects are still possible; for example, if any of the memory cells used to store data bits are malformed, they may be unable to hold a charge or may be more vulnerable to environmental conditions.
Similarly, while the memory itself may be operating normally, any physical or electrical damage to the circuitry used to read and write memory cells may also cause data bits to be changed during transfer, which results in a parity error.
Known as hard parity errors, these events are typically very frequent and repeated and occur whenever the affected memory or circuitry is used. The exact frequency depends on the extent of the malfunction and how frequently the damaged equipment is used.
Remember that hard parity errors are the result of a hardware malfunction and reoccur whenever the affected component is used.
If you encounter hard parity errors, analyze physical changes that have occurred at the location of the affected system. Common sources of hardware malfunction that may lead to hard parity errors include:
– Power surges (no ground)
– Overheating or cooling
– Incorrect or partial installation
– Component incompatibility
– Manufacturing defect
How to identify the module: –
%EARL-SW2_STBY-1-EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM —> Standby SUP in Switch 2 (VSS)
EARL-DFC4-1-EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM —-> Module 4
: %EARL-SW1_DFC2-1-EXCESSIVE_PARITY_ERROR: EARL 0: Parity error detected in VRAM —-> Module 2 in Switch 1
Solution: – Re-seat the affected module. Monitor the same for 48 hours. If error repeats, replace the module.