ETH40G (40GbE) Test
Purpose
This test verifies data integrity between the two 40GbE links on TPMs. It verifies high data rate traffic can be sent in both directions between the two TPM 40GbE ports with no packet loss or errors. The test runs for a configurable duration. This test can additionally be used to diagnose 40GbE QSFP cable incompatibilities or defects (DACs or AOCs) or hardware defects with the TPMs 40GbE flyover cable that connects the FPGA to the QSFP connector.
This test will automatically determine a transmission matrix based on the number of active QSFP connections to each TPM in the station. Tiles with two QSFP connections will transmit in loopback. This can either be via a switch or directly looping back the two TPM QSFP ports. Depending on if “pairwise” or “circular” mode is selected, tiles with one QSFP connection are either transmit in pairs, or form a circular ring from one tile to the next and back to the start. Pairwise mode does not require a network switch, adjacent tiles can be directly connected, so is the default configuration. However, this mode does not support an odd number of tiles. Circular mode requires a network switch as each tile communicates with two others, but supports an odd number of tiles with a single QSFP connection.
If any TPMs are detected with no active QSFP connections, the test will be skipped. Remove these from the station to run the test without them.
If only a single TPM has an active QSFP connection, the test will be skipped and advise you to change your hardware configuration.
Methodology
TPMs are categorised based on the number of active QSFP connections detected.
A transmission pattern TX to RX based on this categorisation is generated.
Data transmission from the station is stopped in preparation for the test.
For each TPM FPGA with an active QSFP, the 40GbE embedded test module is started. This generates UDP packets which are transmitted over the 40GbE network to another FPGA. Source and destination IP addresses are determined from the transmission matrix.
Each embedded test module counts the number of UDP packets it receives.
The software monitors the UDP CRC and BIP error counters continuously and stops the test if errors are detected.
When the test finishes, either due to an error or due to the duration elapsing, the software checks that the total number of packets transmitted by one FPGA matches the total number of packets received by the destination FPGA.
Typical issues
Some typical issues and how they present are shown below:
Presentation: The test fails immediately.
Issue: Check the link has been brought up correctly. This issue has been observed with incompatible cables or with network switches not being configured correctly.
Presentation: The test reports CRC or BIP errors
Issue: This is likely a hardware issue with the flyover cable on the TPM, or an issue with the 40GbE cable. Replace the 40GbE cable and try again, AOC cables tend to work more reliably than DACs as they require less switch configuration.