ArduSat-1 and ArduSat-X launched from the ISS Picture credit: NASA
It’s such a great idea: students in schools and colleges inspired to become the next generation of Space engineers by building and orbiting their own satellite. But it’s not a new idea. The first amateur satellite, OSCAR 1 was actually launched in 1961 as a secondary payload - only four years after Sputnik 1. Project OSCAR continued with more launches until 1969 when the Radio Amateur Satellite Corporation (AMSAT) was formed to co-ordinate further activity. The OSCARs are primarily amateur radio communication satellites built by students in educational establishments worldwide. Three were constructed at Surrey University in the UK starting with UoSAT-1 (OSCAR 9) in 1981 then UoSAT-2 (OSCAR 11) in 1984 and UoSAT-3 (OSCAR 14) in 1990. The educational purpose behind these satellites was highlighted when a special purpose radio receiver with software for the BBC Microcomputer became available. In the late 1990’s the CubeSat was born when it was realised that thanks to microelectronics it had become possible to squeeze most of the functionality of an OSCAR satellite into a 10cm cube. Since the first one made it into Low-Earth Orbit (LEO) in 2003, many more have followed. But the failure rate has been high for two reasons: the stresses of launch can ‘break’ the CubeSat before it even reaches orbit, and the Space environment is very hazardous for delicate electronics once orbit is achieved. What can be done to increase the likelihood of success?
Resiliency and Reliability
These two terms represent different approaches taken at the design stage to achieve mission success. The word Resilience, as applied to an autonomous robot working in a hazardous environment where maintenance is impossible (e.g. Space, nuclear accident site), implies an ability to recover from mission-critical breakdowns by implementing work-arounds supplied to it by remote human operators. Reliability implies fault-tolerance with self-diagnosis and redundant circuitry preventing any mission disruptions needing operator intervention. Generally, building-in resiliency is a lot cheaper than the much more expensive reliability - at least in terms of development costs.
A Resilient CubeSat
In 2014, engineers at NASA’s Goddard Space Flight Centre were tasked with building a better CubeSat. GSFC was more used to designing for high-reliability with an appropriately large budget. The mission this time was to go for improved resiliency and ‘low-cost’, the only option normally available to amateur constructors. The 6U-sized CubeSat named Dellingr was finally put into orbit in 2017 and there began a long saga of faults, fault-analyses and fixes. This paper Dellingr: Reliability lessons learned from on-orbit is a detailed diary of the activity making the satellite operational and recovering scientific data from its instruments.
Dellingr in the Test Lab with antennae deployed Picture Credit: NASA
While developing a piece of equipment that cannot be returned to the workshop when it develops a fault in service, engineers are always on the lookout for a Single Point of Failure (SPoF) in their design. In the case of a satellite or planetary rover, it refers to a single component or circuit that, should it fail, will cause the entire mission to be aborted. The electronic technology in your home is riddled with them. Consider a domestic WiFi network consisting of wirelessly connected PCs, smartphones, printers and a voice-activated home automation system. The wireless router is the ‘weak-link’, the SPoF. Within an individual item of equipment, there will be many SPoFs. In fact, practically every component in, say, a smartphone must work fully to specification or the whole device is, well, just so much junk. It’s annoying when something fails suddenly, but you can either get it fixed or order a new one on the Internet – providing, of course, it’s not the router that’s broken….
Beagle 2 – How SPoFs ended a dream
In 2003 a lander called Beagle 2 was launched towards the surface of Mars from the ESA orbiter Mars Express. It was not seen again until 2014 when it was spotted on the surface by a NASA orbiter:
Beagle 2 was built on a shoestring budget which just about covered the science package. When it comes to lifting space probes off planet Earth, it’s all about weight - the heavier the payload, the higher the cost. Resiliency hardware was kept to a minimum and redundancy for reliability was pretty much out of the question. Beagle 2 never really stood a chance. If you’ve watched the video, the weakness in the clam-shell-with-petals design should be obvious – every one of the solar panels had to deploy before finally exposing the antenna. Each of the hinges and their motors was a possible single point of failure. The images taken from orbit in 2014 suggest strongly that at least one panel did not deploy so preventing any communication. Beagle 2 had got past all the obvious potential landing SPoFs, parachute deployment, airbag deployment, only to fall over at the last hurdle. Another weight/cost saving had been on landing telemetry: once the spacecraft had been pushed away from the orbiter, it just vanished with no way of finding out what precisely had gone wrong. A lot of CubeSat owners have had the same experience.
Lessons on Mechanical Resilience
- The biggest challenge any CubeSat faces is during the launch phase: high ‘g’ (up to ±10g) on all axes as well as rotational acceleration and periods of intense vibration at audio frequencies. Large unsupported PCBs and loose cabling with non-lockable connectors are definitely out. Fortunately, the 10cm cube format limits the PCB size, so the satellite consists of a stack of boards held together by bolts and spacers at the corners. Inter-board connection is by multi-pin plugs and sockets with no cabling. The stack is bolted into a square-section metal tube which acts as a carrier for the solar panels on the outside.
- Extreme temperatures are encountered in Space – way beyond the operational limits of semiconductor devices likely to be used in a CubeSat. In sunlight, the outside surface temperature of the solar cells can exceed +150°C; in darkness it can go as low as -200°C, depending on altitude. However, a tiny unpressurised satellite in low-earth orbit should stay within a much narrower range, making the use of standard industrial temperature (-40 to +85°C) devices possible. It is important to remember that heat energy can only travel through a vacuum in the form of radiation: no convection, no conduction. Without an atmosphere inside the satellite, there is no circulating convection current to transfer heat. There will be some re-radiation internally but this can be limited by insulation, and the only conduction to boards will be through mountings and electrical connections to solar panels, etc. Similarly, when the satellite orbits into darkness, internal heat will only gradually radiate away. But in LEO, one orbit takes about 90 minutes with an even split of sunlight exposure and darkness. The alternate periods of heating and cooling allow the satellite internals to keep within temperature limits. So long as the electronics itself is not generating much heat, then no special cooling/heating equipment should be necessary.
Lessons on Functional Resilience
This is the hard bit, and where you need to draw upon the experience of others to get a feel for the most likely failure modes and their work-arounds. While the extremes of temperature in Space are relatively easy to deal with in the context of an LEO CubeSat, Cosmic particles (previously wrongly called Cosmic Rays) are quite another matter. They are essentially bits of atoms that have been torn apart, possessing enormous amounts of energy. Crashing into human flesh they can damage cells causing mutations which often lead to cancerous growths – an obvious problem for astronauts. Collisions with electronic circuits may cause transient or lasting damage. For instance, a memory cell could just have its logic state ‘bit-flipped’ or be left stuck in one state permanently. Circuit chips can be purchased radiation-hardened, but they are very expensive; way beyond the budget of an amateur satellite builder. What can be done to make a CubeSat more recoverable in the face of random and unforeseen faults, transient or permanent, without a big increase in hardware complexity?
The heart of any CubeSat is going to be the Microcontroller (MCU) chip – an obvious single point of failure. Bearing in mind the (relatively) low-cost requirement, it’s best not to think about Space-Qualified devices. A very cheap 8-bit MCU is not a good idea either because it probably won’t have the necessary performance. Neither is a 32-bit, very-high performance, multi-core chip suitable, as the very small circuit elements may make it more vulnerable to cosmic particle impacts. You will also be tempted to fill its vast program memory with code which will take forever to debug. A good compromise would be the Texas Instruments MSP430FR range of 16-bit MCUs with FRAM memory. FRAM, along with MRAM are part of a new generation of genuinely fast read/write non-volatile random-access memory. Not only that, both technologies are resistant to radiation damage making them ideal for use in Space. Here is the datasheet for the MSP430FR5964 with 256KB of FRAM. There is also a certified Rad-Hard version, but I’ll talk about that in Part 2.
When selecting a particular MCU device from a family, remember to keep it simple. Modern chips can be overloaded with peripheral interfaces you don’t need. For example, a basic CubeSat controller is unlikely to use USB, Ethernet or CAN communications. Instead, go for UART, I2C and SPI serial busses and make sure you have one channel per peripheral device. These are relatively slow, but robust busses with simple protocols. In theory, both I2C and SPI can each handle multiple peripherals on a single bus. Try to avoid this, as I2C in particular is vulnerable to a failure of one peripheral jamming communication with others sharing the bus.
The developer of an embedded system is used to pushing the Reset button whenever the processor becomes unresponsive or ‘crashes’ due to an as yet undiscovered coding bug. Reset should then return execution to a known working piece of code to await the debugging process. That’s fine on the bench, but if it crashes while in orbit, what then?
- Some form of automatic crash detection and reset is required. Most MCUs feature just such an on-chip circuit called the Watchdog Timer that forces a reset if the processor clock stops.
- OK, but what if the program crashes without the clock stopping – the most likely scenario? In that case, we need an external Supervisor device that will force a reset if a regular programmed ‘Heartbeat’ pulse from a GPIO pin stops, signalling a program crash.
- Then there is the case where the MCU gets stuck in a loop, perhaps waiting for an unresponsive sensor, but is still sending the heartbeat to the Supervisor chip. The solution adopted on Dellingr was to add a timer chip that forced a full reset every 25 hours regardless.
If you’re lucky, a reset might clear a ‘glitch’ and the system will run again normally. But if it promptly goes down again, then either there is bug in the firmware, a chip has been damaged by a cosmic particle, or a manufacturing fault has surfaced. Under these circumstances, recovery is going to require analysis of downloaded telemetry from the satellite followed by an upload of a firmware ‘patch’, either fixing a bug or working around a hardware fault. To this end, all resets from whatever source, must lead to a mission control input routine where the satellite waits for instructions via the radio on what to do next.
The radio communications link is another potential SPoF. Without it, no mission data or diagnostic telemetry can be sent back to Earth, and no commands or code patches uploaded. Unfortunately, for a tiny CubeSat it’s a very weak link. Power is very limited so most Commercial Off The Shelf (COTS) boards will only offer low data rates. However much money is spent, communicating with a satellite in low orbit (around 200 miles up) is going to be difficult. It’s moving at about 17000 mph so takes 90 minutes to do a full Earth orbit. From the point of view of an observer on the surface, the satellite is only ‘in view’ for 5 minutes or so, as it moves across the sky. Just to communicate for that short interval – 5 minutes in every 90 – will probably require a fully steerable high-gain antenna on the ground. You can see that large firmware updates are just not feasible.
Essential Reading (and Viewing)
NASA’s CubeSat 101 Beginners Guide, Basic Concepts and Processes for First-Time CubeSat Developers. AMSAT and NASA have recently created a useful Educational tool: The AMSAT CubeSat Simulator: A New Tool for Education and Outreach. The following video presentation on the ArduSat project in 2014 is well worth watching to get a feel for the size of the task placing a CubeSat in Earth orbit:
Next time in Part 2
CubeSats only last about 6 months before they fall back to Earth. That’s why you can ‘get away with’ COTS parts and limited testing. Longer lifetimes require a lot more design effort, more rigorous testing and expensive components to make a satellite more reliable.
Are there any DS members out there who are/have been involved in a CubeSat project? We would really like to hear about your experiences!
If you're stuck for something to do, follow my posts on Twitter. I link to interesting articles on new electronics and related technologies, retweeting posts I spot about robots, space exploration and other issues.