Design of High Availability Systems & Software
Duration: 2 days
Number of participants: recommended optimum 15, maximum 25
The primary goal of this course is to give participants the skills necessary to design software for real time and embedded computer systems that must relentlessly provide service despite the occurrence of internal and external faults. This is a very practical, results-oriented course that will provide knowledge and skills that can be applied immediately. This course examines the high-level design of embedded systems and software that are to provide their services at near-continuous availability.
High availability systems must tolerate both expected and unexpected faults. Their design is based on redundant hardware and software combined in ways that will achieve “five-nines” (99.999%) or greater availability, equivalent to less than 1 second of downtime per day. Basic hardware N-plexing and voting issues are discussed, followed by an in-depth study of a number of backward error recovery fault tolerance techniques including static N-version programming, Checkpoint-Rollback, Process Pairs, and Recovery Blocks. The class continues with several forward error recovery techniques. Technical issues such as failover management, data replication, and software design defects, are addressed in depth. Many real-world examples are presented.
This course is far from a general course about system or software design theory, but rather it is highly focused on the design of embedded systems and software that must make their services available at all times, with less than 5 minutes per year of downtime.
This course is intended for practicing real-time and embedded systems software system architects, project managers and technical consultants who have responsibility for designing, structuring and implementing the software for real-time and embedded computer systems that are required to continue providing service despite the occurrence of internal and external faults.
Course participants are expected to be familiar with general embedded and real-time software design. This knowledge can be gained by attending a prerequisite embedded software design course such as “Architectural Design of Real-Time Software”.
Many (but not all) high-availability systems are also safety-critical systems – which can threaten human safety or even human life in situations where the system fails and remains unavailable for significant periods of time. For those high-availability systems that also have safety-critical requirements, we recommend that the course “Design of Safety-Critical Systems and Software” should be taken at the same time as this course. The two courses have little overlap in content, and offer complimentary approaches and perspectives. It is possible to combine these two courses into a unified three- or four-day course for presentation at customer sites, under the name “Safety Critical and High Availability Systems Masterclass”.
The course is based on lectures, discussions, design examples, exercises.