Failure resilience in federated ecosystems for instrument science is a critical challenge. Failures disrupt experiments and make them potentially useless, wasting valuable instrument, network and computing allocations and creating setbacks for scientists. Oak Ridge National Laboratory’s (ORNL’s) Interconnected Science Ecosystem (INTERSECT) Initiative offers a federated ecosystem for instrument science, enabling autonomous experiments, self-driving laboratories, smart manufacturing, and artificial intelligence (AI) driven design, discovery, and evaluation. While it currently does not offer failure resilience, existing INTERSECT experiments require it, such as to reliably steer an autonomous additive manufacturing (AAM) process using real-time data streamed to a simulation in the feedback loop.
Figure 1: The INTERSECT autonomous additive manufacturing experiment uses a thermomechanical simulation in a live feedback loop to control the residual stress in a printed part
This project creates a resilient INTERSECT ecosystem architecture using resilience design patterns, a resilient system of systems (SoS) architecture, and a resilient microservices architecture. Its proof-of concept prototype implements a resilient federated ecosystem using the INTERSECT software development kit (SDK) and demonstrates resilience capabilities for the INTERSECT AAM cross-facility experiment between the Manufacturing Demonstration Facility (MDF) and the Oak Ridge Leadership Computing Facility (OLCF) Advanced Computing Ecosystem (ACE) testbed. The outcome of this project facilitates the proper development and deployment of resilience for federated ecosystems. It creates, implements and demonstrates a consistent design methodology that allows scientists to pick and choose the right solution for the resilience problem at hand and deploy it with ease. It enables a resilient federated ecosystem that facilitates the US Department of Energy’s (DOE’s) Integrated Research Infrastructure (IRI) vision.
Prominent Solutions
Funding Sources
- Interconnected Science Ecosystem (INTERSECT) Initiative, Laboratory Directed Research and Development, Oak Ridge National Laboratory
Participants
- Christian Engelmann (PI), Andrew Ayres, Michael Brim, Stephen DeWitt, Swen Boehm, Addi Malviya, Marshall McDonnell, and Ryan Prout — Oak Ridge National Laboratory
White Papers
- Ryan Adamson and Christian Engelmann. Cybersecurity and Privacy for Instrument-to-Edge-to-Center Scientific Computing Ecosystems. White paper accepted at the U.S. Department of Energy's ASCR Workshop on Cybersecurity and Privacy for Scientific Computing Ecosystems, November 3-5, 2021.
- Hal Finkel, Pete Beckman, Christian Engelmann, Shantenu Jha, and Jack Lange. Research Opportunities in Operating Systems for Scientific Edge Computing. White paper by the U.S. Department of Energy's ASCR Roundtable Discussions on Operating-Systems Research 2021, January 25, 2021.
- Hal Finkel, Pete Beckman, Ron Brightwell, Rudi Eigenmann, Christian Engelmann, Roberto Gioiosa, Kamil Iskra, Shantenu Jha, Jack Lange, Tapasya Patki, and Kevin Pedretti. Research Opportunities in Operating Systems for High-Performance Scientific Computing. White paper by the U.S. Department of Energy's ASCR Roundtable Discussions on Operating-Systems Research 2021, January 25, 2021.
Symbols: Abstract, Publication, Presentation, BibTeX Citation