System Reliability, Availability, and Repairability

As seen in the Apollo Program, there were malfunctions, software glitches, mechanical and electronic failures, and accidents that occurred on every mission. Apollo 11 had its 1201 and 1202 overflow problems. Apollo 12 had the lightning strike during launch that led to the SCE to Aux activation. The near catastrophic oxygen tank explosion that cost the Apollo 13 crew the lunar landing and nearly cost them their lives. The faulty abort switch on Apollo 14 nearly ruined the lunar landing. Apollo 15 experienced a failed parachute during its return to Earth. There were broken lunar rover fenders on two missions. And the list goes on, many not publicly reported by NASA.

The Apollo spacecraft, the Saturn V, and Mission Control were extremely com­plex machines, built from hundreds of thousands of individual parts, and with computers both onboard and at Mission Control containing thousands of lines of software code.

One of the reasons for the success of the Apollo program was the redundancy of critical systems. During Apollo 15, there was a short circuit in the “Delta-V Thrust” switch. This switch opened the valves in the SPS. The short circuit of this switch meant that the engine itself was fine, but new procedures would have to be used when operating the engine to stop accidental ignition. In the case of the SPS, there were two independent valve systems for the engine, and with the proper reconfigu­ration of the valves and switches, any problems could be worked around. Similar redundancies of systems and system reconfigurations enabled successful missions, as seen for instance in the Apollo 12 SCE-to-Aux event.

But redundancy has its downside. As stated in the Apollo 15 chapter, redundancy actually lowers system reliability while increasing the system availability. To reiter­ate, availability is a measure of the percent of time the equipment is in an operable state, while reliability is a measure of how long the item performs its intended function. A dual redundant system implies two identical systems performing the same function, thus doubling the number of component parts. A failure on one side of the system results in the activation of the redundant system, thus assuring the continued function. With twice the number of parts to assure continued function, the chance of part failure increases and reliability actually decreases. At the same time, the redundancy increases the availability of the function.

A Mars mission with a duration of months or years will stretch the limits of tech­nological durability. Lessons of long term reliability and maintainability are now being learned on the International Space Station. Inherent in the design of both the ISS and a future Mars spacecraft is not only reliability and system maintainability, but also repairability. In the language of logistics engineers, a Mars mission will require an extraordinary set of mean-time-between-failures (MTBF), mean-time-to-repair (MTTR), and functional and service availability requirements that will be historic.

During the writing of this book, a pump on one of the ISS’s two external cooling loops shut down after hitting a temperature limit. The external cooling loops are systems that circulate ammonia outside the space station to keep equipment cool.

Two loops of circulating ammonia cool equipment on the station. The problem started with a malfunction of a valve inside the pump, located on one of the ISS exterior trusses. ISS flight controllers shut down the malfunctioning cooling loop, with the remaining loop provides sufficient for regulating the temperature of critical equipment, and there was no immediate danger to the six crew members aboard at the time. Astronauts aboard the ISS executed three spacewalks, each lasting 6.5 hours, to replace a malfunctioning pump.

An extended mission to Mars will necessitate an ability for the astronauts to repair any malfunctions aboard the Mars spacecraft. But unlike the ISS, which periodically receives supplies of consumables and spare parts, the Mars spaceship will have to carry a supply of repair items for any emergency.

Apollo 13 gave NASA the experience of reconfiguring a damaged spacecraft in times of emergency. A Mars mission will not have the luxury of a quick return to Earth, as Apollo 13 had. A major failure could occur at the mission’s furthest point, necessitating a return to Earth in terms of months. A Mars bound mission will require extraordinary logistical planning, while challenging the aerospace engi­neers to design a spacecraft with reliability, redundancy, and flexibility in configu­ration that allows for survivability in case of malfunctions.

For the Mars spacecraft to carry a warehouse of space parts for its journey would be a waste of resources, storage space, and added fuel costs. It is con­ceivable that a Mars mission could fabricate at least some of its spare parts and tools on the fly by using additive manufacturing, more commonly known as 3-D printer technology. This concept is reminiscent of the replicator technology proposed by the television series and movie franchise Star Trek. NASA has announced that it intends to send a 3-D printer to the International Space Station in 2014. NASA says that astronauts living aboard the ISS would use the 3-D printer to make spare parts and tools in zero gravity. Once the printer arrives at the ISS, it will mark the first time a 3-D printer has been used in space. 3-D printers typically use a polymer material, but there are 3-D printers able to use titanium and nickel-chromium materials to build stronger compo­nents. The 3-D printer technology will have to be modified to use raw materials in a form other than the powders used on Earth. Typical 3-D printers on Earth utilize polymer powders to form plastic 3-D objects, but in a weightless envi­ronment, powders of any kind will impact the onboard environment. Contamination of the breathable air and migration of any powder material into the electronics and mechanical systems onboard the ISS or a Mars bound space­craft could have disastrous results.