What does it take for AI to build space software?
On June 4, 1996, the highly anticipated maiden flight of the Ariane 5 rocket was cut short when it self-destructed only 37 seconds after liftoff. The culprit behind the $370 million in damages: an integer overflow in the control software, caused by the conversion of a 64-bit floating point number into a 16-bit signed integer.
This failure became one of the most studied software disasters in aerospace history and served as a catalyst for more rigorous standards set by the European Space Agency (ESA).
Building software for space is one of the hardest invention problems in engineering because the standards leave zero margin for error. As AI becomes increasingly capable of writing and reasoning about code, a new question arises: what does it take for AI to work at that level?
What makes software “space-grade”?
Developing space-grade software is uniquely difficult, and the standards for code that runs in space are far more demanding than for the software that most of us use every day.
The hardware processors on spacecraft are built to survive the extreme conditions in space, which often means sacrificing performance and raw speed. This tradeoff puts more pressure on the software to be efficient while maintaining the required level of precision. While spacecraft software can be updated, those updates depend on the foundational systems. If any of those contain a flaw, the spacecraft loses its ability to heal itself.
During critical phases of every mission such as landing, orbit insertion, and docking, the signal delay between Earth and the spacecraft makes real-time human intervention impossible. While in those windows, the software is entirely on its own. Every safeguard, every fallback, every recovery procedure has to be built in and verified before launch.
These limitations are just a few examples of why organizations like ESA developed stringent standards that define what correctness means for flight software before it ever launches. Consider libmcs, a mathematical library used in satellite navigation, orbital mechanics, and flight control systems. What makes it space-grade is that it’s purpose-built to guarantee predictable, reliable results regardless of the hardware it runs on.
In our most recent blog post, libmcs was one of four libraries that our AI agents migrated from C to Safe Rust. Memory safety is built into Safe Rust, eliminating entire classes of bugs at the compiler level and making it an increasingly attractive language for safety-critical systems.
But memory safety alone isn't enough. Migrating a library like libmcs also means proving that every reliability guarantee that made the original code space-grade carries over to the new language.
We worked with GTD GmbH, who certifies flight software to ESA standards, ensuring that the code produced by our agents met those astronomically high standards.
What space-grade precision looks like
To understand what meeting those standards looks like under the hood, consider something that appears simple, like computing a square root. Most software computes a square root by offloading the calculation to the hardware, which is fast and accurate enough for general purposes. However, different processors can give slightly different results at the very last digit of precision, which is just not consistent enough for space software.
Instead, libmcs uses an algorithm that reconstructs the answer bit by bit, testing whether each candidate answer squared still stays within bounds (think long division for square roots). This guarantees that the result is the closest possible value to the true answer regardless of the input or the hardware it's running on.
At each step, the algorithm asks a simple yes/no question: "If I turn this bit on, does my answer squared still stay ≤ 2?" If yes, keep the bit. If not, discard it, because it would make the answer too big.
| Iteration | Bit position | Corresponding value | Total amount being squared | Squared | Turn on bit? | Partial result (binary) |
|---|---|---|---|---|---|---|
| 0 | 0 | 1.0 | 1.0 | 1.0 | ✓ | 1 |
| 1 | 1.0 | 0.5 | 1.5 | 2.25 | ✗ | 1.0 |
| 2 | 1.00 | 0.25 | 1.25 | 1.5625 | ✓ | 1.01 |
| 3 | 1.010 | 0.125 | 1.375 | 1.81 | ✓ | 1.011 |
| 4 | 1.0110 | 0.0625 | 1.4375 | 2.066 | ✗ | 1.011 |
| 5 | 1.01100 | 0.03125 | 1.40625 | 1.978 | ✓ | 1.01101 |
This kind of bit-level rigor is exactly what makes these systems so hard to build, and extremely meaningful to get right. It's also what makes the code more tricky to migrate.
How our agents caught what space experts missed
Because C and Rust have different rules for how they handle memory, error conditions, and program safety, our agents had to deeply understand the algorithms themselves: why each operation produces the correct result, and what guarantees it provides.
During the code migration, our agents continuously verified their work against GTD's test suite. When a test failed, they investigated by running debugging passes, checking results against independent math libraries, and analyzing the code to trace the source of every discrepancy. Sometimes the issue was in their translation. Other times, it was in the test suite itself.
That process surfaced subtle inconsistencies in GTD's test suite that had gone undetected through years of expert review. As GTD noted, the findings were significant enough to hold their next release of libmcs until the issues were resolved.
This points to something broader: rebuilding software in a new language forces every assumption to be reexamined, making rigorous code migration a powerful form of verification. AI agents are uniquely suited to this work, as they can apply the same level of scrutiny to every edge case at a scale and consistency that is difficult to sustain through human review alone.
Space is an extreme case, but the principle generalizes: AI that can reason at this depth to catch what experts miss and preserve guarantees across code language migration is AI that can help invent at the frontier. That’s what we’re building toward, and we’ll be sharing more thoughts and examples about co-invention with AI in future posts.
Curious about the technical details? Our previous post on migrating critical systems to Safe Rust covers the full translation process, including a deep dive into the challenges our agents navigated. The Safe Rust translation of libmcs produced by our agents is also open sourced on GitHub.