Loosely Synchronized Dual Processor Architecture Computer Science Essay

Published: Last Edited:

This essay has been submitted by a student. This is not an example of the work written by our professional essay writers.

Modern day cars have been equipped with sophisticated distributed embedded systems. Most of these embedded systems have stringent safety requirements. This trend in the automotive industry keeps increasing in the number of safety-related embedded systems which are responsible for active and passive safety of the vehicle. This concept called drive-by-wire or x-by-wire aims at improving safety of the passengers by replacing mechanical components with purely electronic or electro mechanical components. Researches have also been carried out towards a fully automated or assisted driving experience [4][5].

Since these electronic systems should be able to handle highly safety critical tasks without mechanical backup and operate in particularly harsh environment, their fault tolerance is of utmost importance. Also to be considered is the fact that the Automobile industry is highly sensitive to cost. This report presents several hardware based fault tolerant architectures and approaches to use these architectures in a cost effective manner by using scheduling methodologies.

1.1. Automotive Safety Requirements

As defined in [6] safety integrity is defined as the probability that a safety related system performs its safety functions satisfactorily under all stated conditions for the stated period of time. The safety standards for automotives described by IEC 61508 and ISO 26262 assign automotive safety integrity levels (ASIL) for electronic components in vehicle. The ASILs assigned to electronic components are in the range of A to D; in the increasing order of risk (Level A - correspond to least amount of risk) [8].

In the coming years safety characteristics (including fault tolerance) along with performance would be the distinguishing factor in automotive field. Figure 1 show how the safety requirements of electronic components in vehicle have increased throughout the years as well as the degree of automation in this field.

ASIL D is the most stringent requirement as per the safety standards and has to provide fault tolerance to all types of hardware faults.

The faults are general classified into,

ï‚· Transient faults: These occur due to external influences like heat, electrical noise, radiation etc. Radiation could impose localized ionisation events which in turn would upset internal data states. The fault induced is also called soft error. Soft errors significantly reduce the system availability. The rate at which they occur is called Soft error rate (SER). It has to be noted that soft error is of concern in current technologies where device size is shrinking [12][13][15].

ï‚· Intermittent faults: These are transient faults that occur from time to time.

ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 2

ï‚· Permanent faults: These faults are always reproducible and arise due to hardware (physical) damage. For example the high level of integration in the chips could lead to reduced pitch and wire width and this may lead to an open or short circuit [11].

Figure 1: Severity of failures in electronic driver assistance systems and drive by wire systems [5].

When a fault is detected depending on the ASIL, the system has to degrade or exhibit fault tolerance according to the following levels,

ï‚· Fail-Operational (FO): The system can tolerate a single failure which means system remains operational even after one failure. Highly safety critical system in ASIL-D level like electronic brake system in brake by wire and sensor and actuators which supports the functionality must have this capability. Another example is the regenerative energy storage system in electric cars [16].

ï‚· Fail-Safe (FS): Even after a failure, component continues to function and initiates some actions so that the system reaches a safe state. A safe state for a vehicle is defined as stand still state. This fault tolerance is required for systems in ASIL C and D. The mechanical backup for the electronic control system, the propulsion system in electric cars are examples for the fail-safe units in vehicle [5] [16].

ï‚· Fail-Silent (FSIL): Upon a failure, the affected component shuts down so that it will not wrongly influence other components [5].

ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 3

1.2. Analysis of Fault tolerant techniques

Fault tolerant techniques can be categorized into;

ï‚· Information redundancy: This methodology works by adding extra information or redundant information during data exchange. The extra information added can be used for error detection and error correction. There are several forms of error detecting codes namely parity codes (used in memory storage), duplication codes (applied in communication system), checksums (applied in data transfer between memory elements), Berger code (detects all unidirectional errors) and cyclic redundancy check (CRC codes). By using error correcting codes (ECC) like hamming codes, the detected errors can also be corrected. Memories and buses are usually protected by ECC. For evaluating the code words a hardware checker has to be implemented. It is also important that in order to ensure reliability of the hardware checker a self testing mechanism also need to be implemented which can detect internal faults without external stimulus.

ï‚· Temporal redundancy: This technique employs repetition of computation for two or more times and comparison of the results, checking for discrepancies. Time redundancy tries to reduce additional hardware required (as in case with spatial and information redundancy) by using the available time slots. Temporal redundancy was primarily aimed to detect soft errors but it could also detect permanent faults: stuck at fault in a bus line can be detected by sending original data first followed by the compliment of original data after a time interval. The disadvantage of temporal redundancy is that it cannot be applied to systems with hard real time constraints.

ï‚· Spatial redundancy: In this technique a system would have more components than actually required for its functionality. These components could be identical to the existing ones or with different functionality. The spatial redundancy itself is broadly divided into ,

o Passive hardware redundancy: To achieve fault tolerance the replicated components execute same tasks and their result are routed to a voter or a comparator which checks the validity of the results. Depending on the active number of such components the system could switch between Fail operational, fail safe or fail silent modes.

o Active hardware redundancy: This method tries to achieve fault tolerance by fault detection, localization and recovery. Compared to passive strategy, in this method there is no attempt to prevent faults from producing errors in the system. Once an error is detected, system tries to find out the fault location and then reconfigures the system without the faulty component (degraded mode) or activating the standby component (standby replacement).

o Hybrid redundancy: This method has the attractive features of both Passive and active redundancies. That means system has the ability for fault masking and reconfiguration.

ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 4

The disadvantage of spatial redundancy is the cost of hardware and it increases from active to hybrid redundancy [10] [17].

The first section of this report aims to discuss mainly the spatial redundant architectures which can be applied to the embedded systems in automotive industry. The functionality and the advantages and disadvantages of the architectures will be discussed. The section ends with an example implementation of a fault tolerant architecture in the industry.

The fault tolerance realised with redundancy incurs cost and the next section of the report focuses on the steps or methods to reduce the cost of redundancy and derive maximum throughput from the architecture. Two methods named as relaxed dedication and Distributed temporal redundancy will be presented. This is followed by a comparison and study about performance improvement.

2. Fault Tolerant Multi-Core Architectures

The multi-core systems-on-chip are the new trend in Embedded systems to increase the performance so that sophisticated control functions can be implemented. In automotive domain multi core is attractive to ensure the required safety standards with redundancy and in addition brings higher computational power. With the current rate of technology scaling more and more cores could be integrated into a chip. However it is also known that such scaling will make the electronic components more susceptible to soft errors due to external disturbances. The other known problems due to scaling down are variability and degradation (aging) [9].

In the next sections different multi-core architectures are discussed which can be used to effectively handle soft errors caused by transient faults.

2.1 Lock-Step Dual Processor Architecture

Figure 2 depicts the Lock-Step architecture which uses two processors: a master CPU and a checker CPU.

The master accesses to the memory and fetches instructions, data and executes them. The checker core continuously executes the instructions on the bus which are fetched by the master. The results of the execution, both addresses and data are fed to the monitor who then compares them with those from the master. The detection of any discrepancy indicates the presence of a fault in either of the CPUs.

The monitor however could not detect any bus or memory error. And therefore bus and memory are to be protected with the help of ECC codes (like parity codes).

This architecture could work as Fail-Silent node which is capable of detecting a single failure (of either CPU) [1] [17] [18]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 5

Figure 2: Lock-Step dual processor architecture [1].

2.2 Loosely-Synchronized Dual Processor Architecture

The Loosely-Synchronised Architecture as seen in Figure 3 has two independent CPUs which can access its own memory subsystem.

Figure 3: Loosely-Synchronized dual processor architecture [1]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 6

Loosely-Synchronized dual processor architecture is an example of asymmetric multi processing system in which the synchronisation and error checking are performed via interprocessor communication by the real time operating system (RTOS) running on both CPUs.

For the purpose of error checking, the critical tasks are to be duplicated in both the memories. They are executed in parallel and the results are exchanged. The individual RTOS then checks the consistency of the result and a fault is concluded in case of any discrepancy. In case of fault detection the results are not committed and each CPU runs its own self test to find the faulty component. If the check was successful to find the particular faulty component then the system can still continue to run (degradation) with the help of the healthy component. Also since in this architecture the bus and memory system are replicated, information redundancy like ECC need not be implemented.

Cross checking of results is important before it is committed. And therefore time guardians can be used to restrict CPU access to outputs for a predefined time-window. This can be either implemented in hardware or performed by the RTOS. Another technique would be to add signatures to the results by each processor. At the receiver data is accepted only after checking both the signatures.

This architecture can be compared to lock-step architecture on the basis of the number of critical tasks to be handled. When there is smaller number of critical tasks, loosely synchronised architecture can utilise both the processors independently for executing non critical tasks thereby increasing the throughput.

On the other hand, in loosely synchronised architecture the critical task set has to be fully replicated in both the memories and performance would be identical to that of a single processor during regular execution. Here the lock-step fares better in terms of performance because of the self checking algorithm implemented as hardware (CPU Checker). The CPU master and checker runs in lock-step and results are fed into monitor synchronously which makes the error detection faster [1] [18].

2.3 Triple Modular Redundant (TMR) Architecture

TMR architecture (Figure 4) is a common form of passive hardware redundancy. The architecture has three identical processors which operate in lock step executing the instruction fetched from a single source (RAM/Flash). There is a voter which is implemented in hardware to check the outputs from the 3 CPUs and it can detect and mask single CPU failure which means that this architecture is Fail Operational. Since the memory and bus system are not replicated they need to be protected by ECC (like parity bits).

The system however depends on the reliability of the voter as the failure of it results in the failure of the system. One method to ensure reliability is by triplication of the voter. This means even if one of the voter fails, the system continues to remain operational with the help of other two voters. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 7

Another difficulty faced by voting mechanism is that in certain situations the result from the processors may not be identical even in fault free situation. As an example, analog to digital conversion can produce variations in at least the least signifant bits. The approach to solve the problem is to select the mid value (value that lies between other two) of the results called as mid value select technique.

Figure 4: Triple modular redundant (TMR) architecture [1].

Also to be noted is the fact that, since all the cores of TMR runs in lock step mode; its performance is similar to that of single processor. However TMR is being used in systems with high reliability requirement, where safety has higher preference than cost.

One of such notable application is in aircraft Boeing 777. The main flight computer (fly-by-wire) of the aircraft must be highly reliable and therefore has three identical units in TMR configuration. And each of these units has 3 processors, again functioning in TMR configuration. In addition to it, the processors used in each unit are heterogeneous [17] [19]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 8

2.4 Dual Lock-Step Architecture

As depicted in Figure 5, in Dual Lock-Step architecture there are two fail-silent channels implemented with lock-step architecture. The total system act as a fail-operational unit which can mask single CPU fault.

Figure 5: Dual Lock-Step architecture [1].

The critical tasks need to be replicated in both the memories in order to check for faults. As an advantage over loosely synchronised model, it can be seen that this architecture need not perform the self test to find the faulty component as the critical code is duplicated and executed by both the CPU masters whose results are then verified by the respective CPU checker.

A point to be noted is that, in case a fault is detected the faulty CPU master/ checker channel can be masked and the other channel can still act as a fail-silent node [1]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 9

2.5 Implementation of Fault-tolerant architectures and Comparisons

Cost need to be considered while implementing the multi-core architectures discussed above. Using shared memory in Loosely synchronised and Dual lock-Step architecture can save cost. For avoiding or reducing the performance bottleneck resulting from two processors accessing the same memory, this can be split into 4 banks (2 for code and 2 for data) and the bus system can be replaced with a cross-bar switch.

Figure 6 depicts a Shared memory Loosely synchronised architecture where both the processors share a memory subsystem.

Figure 6: SM Loosely-Synchronized dual processor architecture [1].

Here the duplication of critical code becomes a trade-off between performance and memory size. Non duplicated code would suffer from slower execution, since cores are not synchronised (not in lock-step) and memory accesses to same location would be at different times. Duplicated code runs faster but consumes more memory space. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 10

Figure 7 depicts the Shared memory Dual Lock-Step architecture. The major plus point is that since the memory subsystem is shared, the architecture provides the same fault tolerance of the TMR solution. In this case, two channels will run in lock-step mode and an additional voter implemented in software would carry out the result comparison.

Figure 7: SM Dual Lock-Step architecture [1].

At the same time it also offers flexibility as two separate fail-silent channels which would provide double performance. Therefore Lock-step and parallel execution becomes two modes of operation for this architecture.

The comparison of the so-far discussed multi-core solutions would reveal some interesting facts

ï‚· The Lock-Step architecture, though improves reliability cannot offer any performance boost over the single processor architecture. Another disadvantage is the limited degradation possibility which is only as a fail-silent node. The encouraging fact is that this architecture has less area (silicon) overhead compared to other architectures.

ï‚· The SM loosely-synchronised architectures on the other hand offer a degraded mode of operation. In order to run in Lock-step mode the critical code has to be duplicated and this increases the memory cost and consumes a lot in terms of area. Also the fault diagnosis is complicated and time consuming as it need to be handled by the RTOS.

ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 11

ï‚· The TMR solution has lesser area required compared to SM Dual Lock-Step and it also provides single fault-masking ability. But the disadvantage is that the performance can only be that of the single processor architecture.

ï‚· The fail silent channels of SM Dual Lock-Step architecture can either operate in lock-step or independently. When operating independently, it gives performance as that of dual processor architecture. When in lock-step, reliability is achieved by providing same single fault masking capability as that of TMR. This all comes at a small cost of the modest increase in area because of the extra CPU added compared to that of TMR.

The SM Dual Lock-step solution is ideal for applications which require high computational power while maintaining the higher safety integrity levels (ASIL D) as described by the standards.

2.6. Multi-core Implementation in the Industry

Multi-core architectures have made into the automobile industry with the advent of highly computational specific applications and reliability requirements for the safety critical systems. The AURIX family from a commercial manufacturer is such an entry to market promising adherence to the ISO26262 standards [7].

Figure 8 depicts the architecture for the AURIX family of MCUs. It has three cores: two performance cores and one efficiency core. The fault tolerance is attained by having a diverse lock step core which is implemented in one of the performance core as well as in efficiency core. The other performance core can run non-critical tasks which do not require fault tolerance. Multi cores are susceptible to common mode failures due to clock tree, power supply or silicon substrate. A diverse lock step core could prevent some of these common mode failures as there is physical separation between cores and physical damage in one core may not affect the other core.

Figure 9 explains the functioning of the lock-step core. The architecture is similar to the Lock-Step architecture having a main core, lock step core and a comparator which are physically separated. In addition to this delays have been introduced between their executions. The delay introduced increases the probability to detect errors due external influences like voltage spikes. In this architecture the comparator is a key component and must be highly reliable [7]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 12

Figure 8: AURIX architecture [7].

Figure 9: Lockstep CPU in Infineon AURIX [7]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 13

3. Reducing Cost by effective scheduling techniques

The Multi-Core architectures discussed in the previous section aim to increase the reliability, safety and also the performance of the embedded systems in the vehicle.

For achieving the required fault tolerance level (namely Fail operational, Fail safe and Fail silent), redundancies in some form have to be used with which the performance of the system can also be improved. Here scheduling techniques can play an important role by helping to reduce the cost of redundancy and increase the performance of the system.

The Electronic control units that handle safety critical application typically have two sets of tasks to be executed namely critical task (CT) set and non-critical task set. The critical set is dedicated to be executed in critical resources (CR) while the non-critical task set in non-critical resource (NCTR). The distinction is due the fact that in order to maintain fault tolerance, the CTs are scheduled in CRs as lock-step execution. The Noncritical task (NCT) subsystem does not usually interfere with the critical resources thereby ensuring the timely detection of failures.

However this could in turn lead to less utilization of processing power because the critical resources remain idle or underutilized when there are no critical tasks running. This brings in an idea called on-demand redundancy which helps to improve the processor utilization and reduce cost [2].

There are two different techniques which can be employed for achieving this goal.

ï‚· Relaxed Dedication[2]: This method allows NCT to be executed in CRs and thereby increase the throughput of NCTs.

ï‚· Distributed temporal redundancy[3]: This method relies on relaxing the lock-step without hampering the reliability of the system.

3.1. Relaxed Dedication

As the name suggests relaxed dedication relaxes the requirement that only critical tasks can be executed in critical resources. This would help to reduce the hardware requirement to schedule and execute the NCTs.

The critical tasks are traditionally scheduled with enough slack such that in case of a detected fault, the task can be re-executed. For this purpose there are retry slots allotted after each CT execution so that the system can try to recompute the faulty operation. The retry slots are statically scheduled with the constraint that they don't cross the task deadlines [15].

Upon an assumption that the transient error rate or soft error rate (SER) and permanent fault rate due to hardware damage is low, these retry slots remain unutilised. Relaxed dedication proposes to schedule NCTs during these retry slots and in the idle slots between critical tasks. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 14

The relaxed dedication can be already deployed in systems with lock-step execution. The only design change required is to add a status bit to notify whether the comparator logic is to be used or not.

When there is a switch to critical task from NCT this bit would be set indicating that the execution must be in lock-step and results are to be compared.

3.1.1. Analytical Model

To understand the performance gains of the relaxed dedication, the following analytical model can be used.

Assuming a dual modular redundancy (DMR) system with N NCTRs, the number of cycles for NCT execution is given by,

WDMR = fit

where fi is defined as the clock frequency of NCTRi and t is the hyper period.

As mentioned above relaxed dedication aims to use the idle and retry slots for performance improvement. The number of cycles for NCTs in DMR system with M critical resource pair is given as,

WDMR+RD = 2fit (1-ci) + WDMR

Here ci is defined as the fraction of time pair of CTRi is executing Critical tasks and therefore (1-ci) gives the idle/retry slots which could be used for NCTs. Considering the easier case for analysis, that each CTRi pair executes CTs for the same fraction of time c.

WDMR+RD = 2Mft(1-c) + Nft

The ratio, WDMR+RD / WDMR = (2Mft (1-c) + Nft)/ Nft = 1 + .

The equation reveals that when c=0.5, M=1, N=1, DMR architecture with relaxed dedication would provide double the performance compared to a dedicated DMR. Also as the number of critical resource pair increases so is the throughput of non critical tasks [2].

3.1.2. Scheduling of critical and non critical tasks

The critical tasks are usually scheduled on critical resources as shown in the Figure 10. As it can be observed, the tasks have retry slots after each scheduled instance and also idle time between them. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 15

Figure 10: Critical tasks (Solid red) and retry reservations (green) [2].

cP0 and cP1 are the critical resource pair while the ncP0 is the non critical resource.

Figure 11 shows that in dedicated architectures the non critical tasks are scheduled only in the non-critical resource.

Figure 11: Dedication limits NCT task's (blue) scheduling opportunity [2].

This dedication of non-critical tasks to non critical resource severely degrades the throughput of NCTs while the critical resource remains idle.

The relaxed dedication methodology (Figure 12) tries to schedule the NCTs in the critical resources as well.

Figure 12: Relaxed Dedication increases the throughput of NCTs [2]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 16

In the scheduling process, at first the critical tasks can be scheduled using the List scheduling method. The scheduling is based on the laxity of each critical task which is the difference between their latest/earliest finish time and the arrival time. And the task with least laxity would get scheduled first. After this non-critical tasks can be scheduled in the retry slots as well as in the idle period.

In case a fault (transient or permanent) is detected by the hardware comparator, the non-critical tasks in the retry slots are pre-empted and the corresponding critical task is executed again in lock-step mode.

In relaxed dedication, the effectiveness depends also on the task size (execution time) of NCTs and CTs. When the ratio of CT to NCT sizes (α) is greater than one, then it means NCTs have smaller size which means that they can be easily scheduled in the retry or idle slots. This would increase the work load of CTRs and improves the performance. If the value of α is less than 1, the noncritical task's sizes are larger and that would make it difficult to get scheduled in the empty slots in the critical tasks static schedule.

Nevertheless experimentations and studies have proven that relaxed dedication significantly increases the cycles for NCT execution. For example relaxed dedication provides 73% more cycles (on an average) for NCTs in DMR architecture [2].

3.2. Distributed Temporal Redundancy

Relaxed dedication tries to improve upon the shortcomings of relaxed dedication by relaxing the requirement to execute NCTs in NCTRs and schedule them together with CTs. Distributed temporal redundancy (DTR) is another methodology which could as well be employed to improve the performance.

Distributed temporal redundancy relaxes on two requirements of traditional lock-step architecture,

ï‚· That the critical tasks are to be executed in lock-step and

ï‚· The critical tasks execute on Critical task resources only.

provided that this relaxation does satisfy the time constraints(deadlines) of the hard real time tasks of the system.

This relaxation gives great advantage on performance and fault tolerance of the system,

ï‚· When the critical tasks are relaxed from the lock-step, then they can be co-scheduled along with the NCTs and this will ensure better workload load distribution among all cores.

ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 17

ï‚· DTR could also provide a fault localization mechanism. Considering a dual modular redundant system, when a mismatch is detected comparing the task result of a critical task pair, then the scheduler can initiate that particular critical task execution on the third resource. The result from the third resource can be compared with previous results, which would help to find out or localize the faulty component.

This implies that the system gives the reliability and fault tolerance that of a TMR architecture at no additional cost (for having another critical resource).

3.2.1. Scheduling opportunities

In a traditional DMR system, the tasks are being scheduled as shown in Figure 13.

Figure 13: Schedule in a DMR system, critical tasks (red) with retry reservations (dashed red) [3].

The tx indicates a critical task instance scheduled and rtx is the corresponding retry instance of the task.

DTR approach applies the concept of temporal redundancy; which means two copies of the same task are scheduled at two different times on the resources. And it also relies on relaxing the lock step. This would in turn help to co schedule NCTs and CTs together. The main advantage is that the schedule can be derived based on optimization and cost consideration. The joint scheduling gives the scheduler opportunity for better usage of time slots, sharing of peripherals and increase the workload of CTRs. With high utilization of CTRs it would be possible to save area in the chip allotted for non-critical resources.

A bottleneck with relaxed dedication based approach was that the larger NCTs (longer execution time) were not able to utilize the smaller idle or retry slots in the critical resource's schedule. DTR approach solves this problem since the lock-step requirement has been relaxed and co-scheduling is employed.

The schedules for the DTR applied system are drawn in such a way that the relaxation in lock-step will not affect the deadlines of the critical-tasks. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 18

Figure 14 shows a schedule using DTR approach.

Figure 14: Relaxed lock step schedule [3].

The DTR approach would help to schedule another set of NCTs, tripling their throughput. As seen in the Figure 14, the critical instances t2 and t1,2 are scheduled in critical resources c0 and c1 without applying lock-step. It can be seen that NCTs requiring larger execution time could be schedulable in CTRs. The retry slots rt1,1 , rt1,2 and rt2 have been reserved for execution on non critical resource nc0 in case a mismatch from critical task pair has been reported. On such a occurrence, the non-critical tasks at that instant would be pre-empted.

The implementation of DTR requires some changes in the hardware. Since the tasks are not executed in lock-step, buffers are to be introduced to store the latest executed task result. The buffer value has to be retained until all copies of the task have been executed. In the above task set task t1,2 and t2 are to be buffered until their copy is executed in another resource.

3.2.2. Design considerations for implementing DTR in a fault tolerant architecture.

The DTR concept can be ported to the lock-step architectures which have been discussed previously. In the lock-step architecture as seen in Figure 15 two CPUs (c0 and c1) run the same code together and a hardware comparator monitors both the address and data outputs before being sent to memory subsystem.

Figure 15: Cores executing in lock-step, data and addressed compared before being send to memory subsystem [3]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 19

When DTR is applied to system, lock-step is relaxed, so that the critical tasks execute at different times on the resources. Hence it is required that buffers to store the previous task results. But there can be a lot of information and it may lead to high storage requirement increasing the cost. In order to reduce storage of too many information, fingerprinting methodology could be utilized [14].

Fingerprinting method relies on cyclic redundancy check (CRC) to compress and store the changes in architectural registers, memory values and load and store addresses. CRC has very good error detection capability, for example a 16-bit CRC has 0.99998 probability of detecting an error. Though fingerprinting takes time for error detection, the scheduling algorithm helps it to complete the calculations before task deadline.

Figure 16 depicts the finger print approach adopted in multi core system using DTR. The finger prints are collected from each task and buffered. When all the copies of the task have been executed, the buffered values are compared by the hardware comparator. If the comparison resulted in an exact match then one copy of the changes to external state(due to the task) can be released as output. Otherwise a third task execution can be scheduled in another resource. Upon completion of third task, the previous buffered values can be compared to latest and the faulty component could be recognised.

Figure 16: Changes to registers, memory and memory addressed are accumulated in a single CRC fingerprint; fingerprint comparison has a high probability of detecting failures independent of accumulated changes [3].

The buffers could itself subject to soft errors and permanent faults and hence their content could be protected with ECC codes.

The NCT tasks are not fingerprinted or buffered, when they execute on critical resources. NCT and CT interference can be prevented with memory protection schemes and with the help of scheduling algorithm. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 20

3.2.3. Performance Estimation and comparisons of results of DTR and RD

The performance of DTR is estimated with the help of simulated annealing and iterative scheduling. The simulated annealing technique performs permutation to assign NCTs to the resources and then uses iterative scheduling to schedule as many NCTs as possible. The experiment setup assumes M pair of critical resources and N pair of non-critical resources, a finite set of periodic CTs and infinite set of NCTs.

The annealer builds up the assignment tree by permuting the possible assignments of NCTs on processors/resources. Figure 16 shows an assignment tree of NCTs for two resources.

Figure 17: Performance estimation assignment tree for two resources [3].

The tree has different levels, for e.g.: level L0 implies a zero assignment while L1 shows two task assignments on two different resources. Nodes 11, 16,17,4 are called terminal assignment as assignment after them cannot be scheduled or the list schedule will produce an invalid schedule. That is no more NCTs can be scheduled without compromising on task deadlines. The annealer tries to find the optimal terminal assignment which could give highest resource utilization.

When comparing the performances of DTR and RD, relaxed dedication has limitation with the relative task size of critical and non critical tasks. For example when the DMR system had critical task size larger than the non critical task size, it fared better. This was due to the fact that NCT task size needed to be smaller so that it fits into retry/idle slots.

The DTR applied system does not have the dependence on relative task lengths since CTs and NCTs are co scheduled. The system fares better compared to RD in terms of consistency and number of cycles used for NCT execution. Experiments have shown that DTR could use 93% of the theoretical cycles for NCT execution and its performance over RD is about 11% on an average [3]. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 21

The DTR could be utilized only when the relaxation of lockstep for critical tasks does not affect their hard deadlines. The number of tasks waiting for comparison ('in-flight tasks') is a cost concern, since this decides the buffer size and comparison logic which are implemented as hardware. The co-scheduling of CTs and NCTs increase the complexity of scheduling algorithm, but today there are several algorithms which can solve this complexity. For example algorithms like polling server and total bandwidth algorithm could schedule aperiodic tasks among periodic critical tasks without affecting the deadlines.

4. Conclusion

The trend in current silicon technology for scaling down makes the electronic components vulnerable to permanent and soft errors. As automotive industry uses electronics for many of its safety applications, fault tolerance to the system is important.

This report introduces the safety requirements in the automotive sector and focus on the available dual core hardware architectures which can be utilised for ensuring system-reliability. There are different multi-core solutions and based on the deciding factor of cost, fault tolerance and computation power, a suitable architecture can be selected. SM Dual lock step architecture has been found to be promising in terms of providing fault tolerance and localisation and delivering high computational power when needed.

The multi-core architecture ensuring fault tolerance brings in new opportunity to increase computational power and reduce cost. This is important as the automotive industry is cost driven and so the fault tolerance must not be an expensive option. The later part of the report focuses on the effective scheduling mechanisms which could be adopted to efficiently utilise the resources (processor cores). The relaxed dedication method increases the workload of critical resources and ensures a better throughput for non critical tasks. This method works well with systems having real hard deadlines for their tasks. For systems with lesser stringent requirement, temporal redundancy can be employed. One such method, distributed temporal redundancy technique is an improvement over relaxed dedication. It would give a better performance and more cycles for NCT execution and with the help of scheduling algorithm it also provides the fault tolerance similar to TMR architecture. ITI Seminar: Safety of Automotive ICs Topic: Hardware-based Fault-Tolerance for Automotive Applications Sunil P 22