
Approaching System Reliability in the AI Era
June 26 @ 12:00 pm - 1:00 pm
[]Ensuring hardware system reliability is increasingly critical in the evolving AI landscape, particularly within data centers. Drawing upon extensive experience leading reliability initiatives for cutting-edge hardware, this presentation will outline a general methodology for designing reliable complex AI systems. It will emphasize the necessity of a multidisciplinary approach, integrating model-based system engineering, rigorous reliability testing, and continuous system improvements, as exemplified by advancements in liquid cooling and power delivery technologies for high-performance AI processors. The talk will focus on the reliability approach needed for resilience in complex, AI-driven environments.
Speaker(s): Venkata Chivukula,
Virtual: https://events.vtools.ieee.org/m/485845