Health Monitoring Test
Purpose
In a system containing 8192 boards, comprehensive health reporting will be essential to detect failures and early degradation of hardware components, facilitating the maintenance of the telescope over time. This test verifies a series of monitoring parameters for each TPM against expected ranges. The principal monitored quantities include temperature, voltage and current for all major components. Also the status of the external timing reference signals, including reference clock, pulse-per-second (PPS), and on-board oscillators, in order to detect errors such as glitches and unexpected frequency conditions. Lastly, the status of the FPGA I/O interfaces and DSP chain. This ensures communication channels are brought-up successfully, are operating reliably and without interruption.
Methodology
Skip any checks for monitoring points not supported by current FPGA firmware.
Request health status for a single TPM.
Check all temperatures, voltages and currents are within acceptable ranges.
Check timing, IO and DSP statuses are all OK and not ERROR.
Test Station Beamformer DDR Parity Error Injection & Dectection.
Repeat for next TPM and so on.