
The Device Chronicle interviews ADAS and robotics software architecture expert Patrick Gögler on the challenges of managing software in safety-critical environments.
Patrick Gögler’s background is primarily in the architecture field for off-highway construction machines, such as excavators and dump trucks, focusing on research and development to transition early-stage ideas from research institutes and universities into scalable, reliable architectures for production, with a focus on safety.
Challenges in achieving the software defined vehicle (SDV)
Patrick argues that the biggest technical reality check preventing the achievement of a truly software defined vehicle is the need for hard real-time safety constraints, which differentiate vehicles from smartphone-like infotainment systems. He says “unlike cloud-native systems, ADAS functions like emergency braking require deterministic behavior within strict time limits, which cannot be guaranteed due to potential network latencies or outages.”
Additionally, the industry faces challenges with a fragmented legacy hardware stack, which includes networks of different ECUs and controllers, each often running incompatible operating systems, having different safety certifications, and utilizing incompatible development workflows.
Balancing CI/CD with safety requirements
According to Patrick, the best approach to balance rapid continuous integration/continuous deployment (CI/CD) with zero-fail safety requirements, such as those mandated by ISO 26262, is through "shift left safety integration". This involves integrating safety into the development life cycle from the beginning, such as enforcing safety coding guidelines at the commit level test.
Traceability is also critical, meaning every function must be shown to satisfy a defined requirement and that the function has been tested against that requirement. To manage the time and cost associated with manual testing for safety, the process should be automated as much as possible, ensuring that automated test generation and execution are linked to a specific build for traceability. Using different testing layers is also a necessary aspect, which includes unit tests for functional correctness, integration tests for interface validation, hardware-in-the-loop validation, and finally, simulation tests for scenarios difficult to replicate in practical use.
OTA updates in safety-critical systems
Patrick shares that the primary challenge of OTA updates for safety-critical ADAS modules is maintaining safety at all times - before, during, and after the update. This necessitates safe and reliable rollback or fallback mechanisms to return the system to a previous guaranteed safe state if the update fails. Redundancy is a direct consequence of the safety aspect, requiring assurance that redundant components also receive the update consistently and that synchronization is maintained to prevent inconsistent behavior between old and new software versions.
Patrick explains that while high-performance compute units (HPCs) or embedded Linux devices can typically use A/B partitioning for rollbacks, the strategy for microcontrollers (RTOS devices) involves making a backup prior to the update and returning to that backup if the update is not completed successfully. This approach for construction machines often utilizes a local backup and orchestration between controllers to account for poor network connections.
Balancing deterministic edge processing with cloud learning
Patrick clarifies that the key principle is that the line is drawn such that anything affecting the real-time driving performance or immediate control decisions must remain deterministic and on-board the vehicle. Conversely, anything related to model improvement, continuous learning, or updates that do not influence immediate car control can happen in the cloud. This approach is necessary because cloud environments suffer from network latencies, synchronization issues, and outages (known as the fallacies of distributed computing), which are unacceptable for safety functions requiring responses within milliseconds.
Containerization and future-proofing the software stack
Patrick argues that containerization and software abstraction layers are crucial for decoupling software from hardware to future-proof the stack against successive generations of automotive silicon. “The goal is to make the software independent from the hardware, so that it defines *what* is done, not *where* it is done, allowing the software to run on new hardware if a processing unit is replaced. Decoupling is important because being tied to a specific hardware vendor creates risks of production stoppages if that vendor cannot deliver necessary quantities.”
Vendor lock-in and industry dynamics
Patrick concludes on the risk for automotive OEMs of vendor lock-in, especially when partnering with AI-first hardware vendors who integrate embedded AI, meaning they would prefer to keep their systems as agnostic as possible. In Germany, existing tier suppliers often control the ECU and MCU stacks, making it difficult for the large OEMs to define new strategies compared to companies like Tesla, which define the whole system themselves. The future seems to point to the OEM to determine their own ECU and MCU stacks and not to have them controlled by external parties and pressures on supply chains.
Contact Patrick on https://www.linkedin.com/in/patrick-goegler/