Systems-on-a-chip for extremely critical applications would use 28 percent less energy and 48 percent less chip area while offering nine times lower hardware failure rate, if designed with the completely novel Desyre architecture. This would drastically reduce hospital costs and replacement rate of medical devices.
Three years ago, the DeSyRe (on-Demand System Reliability) project started with the promise that it would enable extremely reliable medical devices. Three years later, the results are in and they are even better than expected: chips designed based on the new Desyre paradigm are shown to be more reliable and to be less power- and area-hungry than predicted at project onset.
The Desyre consortium had initially promised new design techniques that would counter the increasing fault-rates expected for next technology nodes, while at the same time they would reduce the power and performance penalties introduced by fault-tolerance measures.
To reach such ambitious goals, Desyre introduced a different, hybrid approach to reliability, which separates the system-on-chip into two different areas. One area comprises normal, interchangeable processing cores, which are by nature fault-prone. The second area is extremely resistant to faults and monitors the sanity of the cores in the first area. It assures that each core in that area can handle an assigned sub-task correctly and efficiently, yet transfers tasks from one core to other idling cores in this same area in case of a diagnosed malfunction.
"In the Desyre project, we have coupled a new dynamically reconfigurable substrate together with runtime-system software support in such a manner that it can adapt on demand to various types and densities of faults, system constraints and application requirements, " says Ioannis Sourdis, Associate Professor in Computer Engineering at Chalmers University of Technology, and project leader of Desyre. "We compared the Desyre architecture to prevailing reliability approaches, and Desyre scored better on all aspects. It even scored better than we planned at the start of the project, surpassing all our expectations."
When comparing the Desyre system to a standard Triple-Modular-Redundancy system (TMR; a system which compares the output of three identical modules and then trusts the "majority vote"), a Desyre system requires 46% less chip area and 28% less energy to achieve the same tolerance to transient faults and the same performance as a typical TMR system.
Alternatively, when comparing it to a time-redundant system (the program runs twice and the outcome is compared), Desyre executes code 14% to 32% faster.
Last but not least, when looking at permanent faults and comparing the Desyre system with a core-redundant system of the same area (a system in which everything is implemented with a back-up spare part; the back-up takes over in case of malfunctioning), Desyre reduces the number of failures (due to permanent faults) in a billion device hours (FIT) by a factor 9.
Source: Chalmers University of Technology