Migrating critical systems to Safe Rust with reliable agents
Our mission is to empower people to co-invent the future with AI agents. This is a grand challenge: agents must reason consistently over long horizons, manage complex architectural tradeoffs, and meet rigorous verification standards. These skills are crucial when inventing new systems that go beyond the training data of today’s AI models.
Translating C to Rust involves both direct line-by-line conversions and language-specific reimplementations. Color coding shows which code segments translate directly versus require Rust-specific idioms, with test coverage indicators showing equivalent and unique validation approaches.
As a foundational step, we've applied our AI agents to translating critical software systems from C to Safe Rust. This provides an ideal proving ground for developing agents that reliably complete long-horizon tasks while addressing serious security concerns.
The result? The Rust implementations are memory safe, performant, and adhere to the meticulous testing standards necessary for essential infrastructure. For code designed to run on spacecraft, our agents produced a version that actually surpasses the original C version in accuracy.
Code that flies
For libmcs, code designed to run in space, our agents not only produced a Rust translation passing all ESA-standard tests, but also improved the test suite itself by finding subtle errors expert reviewers had missed.
“Asari AI were able to find a very subtle double error in our test suite which we never spotted throughout years of reviews.”
The need for safe software infrastructure
C code powers critical infrastructure systems worldwide, but its lack of memory safety creates vulnerabilities that account for 70% of serious security bugs in major codebases. Rust eliminates these vulnerabilities by design through its ownership system and compile-time memory safety checks.
Translating billions of lines of C to memory-safe Rust is a national security imperative, as outlined in the DARPA's Translate all C to Rust (TRACTOR) program. Completing this task requires understanding system architecture, redesigning for different memory models, and verifying correctness across complex codebases. This is exactly the kind of intricate work that demands dependable collaboration between humans and AI agents.
One of the key capabilities of our agents is reliably completing long-horizon projects with rigorous testing standards. We challenged our agents with migrating four widely-used production C libraries to Safe Rust. These four libraries are part of the TRACTOR initiative and collectively total 40K+ lines of low-level systems, performance-critical code:
libmcs(Mathematical library for Critical Systems): high-precision math library that meets strict European Space Agency (ESA) standardsgzip: one of the most widely-used command-line tools for file compressionlibyaml: the foundational parsing library used for YAML configuration fileszlib: compression library used by virtually every operating system and application for data compression
We are releasing the translated Rust libraries here, and are excited to hear what you think!
The challenges of system design
Migrating C code to Safe Rust can appear deceptively straightforward: take C code as input, produce equivalent code in Rust as output. But what does equivalent actually mean? C and Rust rely on fundamentally different principles, with no direct line-for-line mapping between code in the two languages.
Deep Dive: Unsafe C data representation → Safe Rust ownership
C: Anonymous union with type tag - requires manual memory management for each variant
typedef struct yaml_event_s {
yaml_event_type_t type;
union {
struct { yaml_encoding_t encoding; } stream_start;
struct {
yaml_version_directive_t *version_directive; // nullable pointer
struct {
yaml_tag_directive_t *start; // pointer range
yaml_tag_directive_t *end;
} tag_directives;
int implicit; // int as bool
} document_start;
struct { int implicit; } document_end;
struct { yaml_char_t *anchor; } alias; // raw pointer
struct {
yaml_char_t *anchor; // nullable pointer
yaml_char_t *tag; // nullable pointer
yaml_char_t *value; // raw pointer
size_t length; // manual length
int plain_implicit; // int as bool
int quoted_implicit; // int as bool
yaml_scalar_style_t style;
} scalar;
struct {
yaml_char_t *anchor; // nullable pointer
yaml_char_t *tag; // nullable pointer
int implicit; // int as bool
yaml_sequence_style_t style;
} sequence_start;
} data;
} yaml_event_t;C-isms:
- Tagged union — type + union to simulate sum type
- Nullable pointers —
*version_directive,*anchor,*tagcan all beNULL - Pointer range — start/end pair for tag directives list
- Raw pointer + length —
*valueand length tracked separately intas boolean —implicit,plain_implicit,quoted_implicitare ints
Rust: Tagged enum with owned data - memory automatically managed
pub enum EventData {
None,
StreamStart { encoding: Encoding },
DocumentStart {
version_directive: Option<VersionDirective>,
tag_directives: Vec<TagDirective>,
implicit: bool,
},
DocumentEnd { implicit: bool },
Alias { anchor: String },
Scalar {
anchor: Option<String>,
tag: Option<String>,
value: String,
plain_implicit: bool,
quoted_implicit: bool,
style: ScalarStyle,
},
SequenceStart {
anchor: Option<String>,
tag: Option<String>,
implicit: bool,
style: SequenceStyle,
},
}Rust idioms:
- Enum with data — sum type with exhaustive pattern matching
Option<T>— replaces nullable pointers; compiler enforces null checksVec<T>— replaces pointer range; handles allocation automatically- String — owned string, no manual length tracking needed
- bool — actual boolean type instead of int
C lacks sum types, sized strings, dynamic arrays, optional types, and booleans — so it simulates them with tagged unions, pointer-length pairs, pointer ranges, nullable pointers, and integers. Rust has native constructs for all of these, so the conversion replaces each workaround with the feature it was simulating. This also changes memory management. In C, each pointer must be manually freed, and a delete function must check the type to know which fields to free. In Rust, when the enum drops, all owned data drops with it — no delete function needed.
The implementation in Rust eliminates entire bug categories: accessing the wrong union field, forgetting to free, double-freeing, and use-after-free.
The central challenge in defining equivalence is specification ambiguity: expressing what the code should do when moving to a fundamentally different language paradigm. We take a test-driven development approach, where the specification is expressed using test suites.
The challenges in this approach manifest in several ways:
- Incomplete specifications: tests tell you what the code should do or what features should be included, but often tests do not cover the full intended functionalities of the target program.
- Implementation versus intent: Even when there is full test coverage, not all functionality makes sense in Rust, such as tests on pointer arithmetic in C.
- Specification errors: Test suites can be flawed and have contradictions between individual test cases.
- Scale: At production scale, test suites often require far more code than the programs they verify. This challenges agents to reason over a large number of possible test compositions and edge cases.
These complexities in rigorous system design push the limits of AI agents and their architecture itself.
Long-horizon inconsistency makes it difficult to maintain correctness across thousands of reasoning steps, where small early errors compound into major failures. Context window limits degrade the ability of AI agents to reason about large amounts of code and tests. Agents also need language-specific or application-specific development tools but AI models might not be trained to use these specialized tools.
Human-AI test-driven development
Building production-ready software is a chicken-and-egg problem: writing comprehensive and correct tests requires understanding how the code will be written, but writing the code requires knowing how it will be tested.
To tackle these challenges, we developed agents centered on rigorous, iterative, and collaborative test-driven development (TDD). Our agents follow high-level goals and guardrails, while ensuring every specification and test case is met. These architectures use primitives and workflows that optimize for expressiveness, efficiency, and correctness across thousands of iterations and large test suites.
While humans provide high-level goals and resolve critical design decisions, our agent autonomously executes the translation through iterative cycles of design, implementation, verification, and optimization.
Our agents also effectively collaborate with humans to make key design decisions and test development, adapting existing tests and creating new ones optimized for performance, code coverage, and coding style. A key advantage is their ability to search over design decisions and identify gaps in specifications. For instance, adapting C's memory model to Rust's ownership system often requires rethinking data access and usage patterns, a form of invention that goes beyond mechanical translation.
Throughout the translation process, agents required very little human guidance, typically 2-10 critical decision points per library where human judgment shaped architectural choices or resolved specification ambiguities. This approach is also economically efficient: running costs ranged from $50-$500 per library, without explicitly optimizing the process for cost.
Verifying software for space
“The additional help of AI in verification and validation processes can contribute to an increase in reliability, understanding, and level of trust in the software we use to fly.”
Building rigorous, safe systems requires a detailed understanding of intent. To test our agents with expert feedback, we worked with GTD GmbH, a trusted test provider that verifies flight software to the strict accuracy and reliability standards set by the European Space Agency. GTD provided a comprehensive test suite for libmcs, a high-precision arithmetic library that performs mission-critical calculations for satellite navigation, orbital mechanics, and flight control systems.
When code runs in space, even seemingly simple operations like computing a square root require high precision, because the slightest numerical error can cascade into mission failure.
Deep Dive: The software implementation of sqrt
On most modern CPUs, sqrt is implemented with a dedicated hardware instruction (e.g., x86 has FSQRT on x87 and SQRTSD/SQRTSS on SSE/AVX). However, when hardware sqrt isn't available, or when a library wants predictable, correctly rounded results without relying on floating-point hardware, it uses a digit-recurrence algorithm.
Unlike the Newton-Raphson root-finding method, which approximates and refines, this method acts like "long division" but for square roots. It determines the result bit-by-exact-bit, ensuring the final answer is perfectly rounded according to IEEE 754 standards.
As with most libm-style implementations, special cases such as NaN, ±∞,
negative inputs, and ±0 are handled up front and return immediately. The
remainder of the algorithm assumes a finite, non-negative, non-zero input and
focuses on computing the correctly rounded significand.
1. Split the float into raw bits
let sign: u32 = 0x80000000;
// Extract the bit representation
let bits = x.to_bits();
let mut ix0 = (bits >> 32) as i32;
let mut ix1 = bits as u32;A 64-bit f64 is composed of a sign, an exponent, and a significand (fraction, often called the mantissa).
| S | Exponent | Mantissa |
|---|---|---|
1 bit | 11 bits | 52 bits |
ix0: Contains the Sign (1 bit), Exponent (11 bits), and the top 20 bits of the significand.ix1: Contains the lower 32 bits of the significand.
We operate on these integers directly to bypass floating-point hardware completely.
2. Normalize and halve the exponent
// Normalize x
let mut m = ix0 >> 20;
if m == 0 {
// Subnormal x
while ix0 == 0 {
m -= 21;
ix0 |= (ix1 >> 11) as i32;
ix1 <<= 21;
}
let mut i = 0;
while (ix0 & 0x00100000) == 0 {
ix0 <<= 1;
i += 1;
}
m -= i - 1;
ix0 |= (ix1 >> (32 - i)) as i32;
ix1 <<= i;
}
m -= 1023; // Unbias exponent
ix0 = (ix0 & 0x000fffff) | 0x00100000;
if (m & 1) != 0 {
// Odd m, double x to make it even
ix0 += ix0 + ((ix1 & sign) >> 31) as i32;
ix1 = ix1.wrapping_add(ix1);
}
m >>= 1; // m = [m/2]This section simplifies the problem using the property: √x = √(2^E × M) = 2^(E/2) × √M
- Unbias: We remove the exponent bias (1023).
- Parity Check: If the exponent E is odd, E/2 would leave a fraction. To fix this, we double the mantissa (2M) and subtract 1 from the exponent (making it even).
- Halve: We perform a simple bit-shift m >>= 1 to compute the new exponent. Now the algorithm only needs to compute the square root of the significand (a number between 1.0 and 4.0).
3. The bit-by-bit loop (upper 32 bits)
// Generate sqrt(x) bit by bit
ix0 += ix0 + ((ix1 & sign) >> 31) as i32;
ix1 = ix1.wrapping_add(ix1);
let mut q: i32 = 0;
let mut q1: u32 = 0;
let mut s0: i32 = 0;
let mut s1: u32 = 0;
let mut r: u32 = 0x00200000;
while r != 0 {
let t = s0 + r as i32;
if t <= ix0 {
s0 = t + r as i32;
ix0 -= t;
q += r as i32;
}
ix0 += ix0 + ((ix1 & sign) >> 31) as i32;
ix1 = ix1.wrapping_add(ix1);
r >>= 1;
}This loop is the core engine. It builds the result one bit at a time, from most significant to least.
The Logic: We want to determine if setting the next bit r in our result q is
valid. Usually, checking if (q + r)² ≤ x requires squaring. However, the
algorithm uses a differential update trick: (q+r)² - q² = 2qr + r² = r(2q+r)
Here, - q accumulates the result bits - s0 tracks 2 × q (avoiding
multiplication) - r is the bit position being tested - t represents the
“cost” of setting the current bit: s0 + r. If the remainder ix0 is greater
than t, we can “afford” the bit. We subtract t from our remainder and set
the bit in q.
This transforms the complex square root problem into a series of subtractions and shifts. No multiplication in the loop.
Trace for √2:
Iteration | Candidate bit | Accepted? | Partial result (binary) |
|---|---|---|---|
| 0 | 1.0 | ✓ | 1 |
| 1 | 0.5 | ✗ | 1 |
| 2 | 0.25 | ✓ | 1.01 |
| 3 | 0.125 | ✓ | 1.011 |
| 4 | 0.0625 | ✗ | 1.011 |
| 5 | 0.03125 | ✓ | 1.01101 |
| 6 | 0.015625 | ✗ | 1.01101 |
| 7 | 0.0078125 | ✗ | 1.01101 |
| 8 | 0.00390625 | ✓ | 1.01101001 |
| … | … | … | … |
This loop computes the high-order bits. Combined with the next step, they perform 53 total iterations, and the bits converge to: 1.0110101000001...₂ = 1.41421356…
4. Lower 32 bits
r = sign;
while r != 0 {
let t1 = s1.wrapping_add(r);
let t = s0;
if (t < ix0) || ((t == ix0) && (t1 <= ix1)) {
s1 = t1.wrapping_add(r);
if ((t1 as i32 & sign as i32) == sign as i32) && ((s1 as i32 & sign as i32) == 0) {
s0 += 1;
}
ix0 -= t;
if ix1 < t1 {
ix0 -= 1;
}
ix1 = ix1.wrapping_sub(t1);
q1 = q1.wrapping_add(r);
}
ix0 += ix0 + ((ix1 & sign) >> 31) as i32;
ix1 = ix1.wrapping_add(ix1);
r >>= 1;
}This loop continues the exact same logic for the lower 32 bits of the 53-bit mantissa. Because we are working with u32 chunks to simulate 64-bit arithmetic, we see manual handling of carries and borrows (checking if t1 > ix1, etc.).
5. Round and reassemble
if (ix0 | ix1 as i32) != 0 {
// Inexact result (we don't raise exceptions in Rust)
if q1 == 0xffffffff {
q1 = 0;
q += 1;
} else {
q1 = q1.wrapping_add(q1 & 1);
}
}
ix0 = (q >> 1) + 0x3fe00000;
ix1 = q1 >> 1;
if (q & 1) != 0 {
ix1 |= sign;
}
ix0 += m << 20;
// Reconstruct the result
let result_bits = ((ix0 as u64) << 32) | (ix1 as u64);
f64::from_bits(result_bits)Note: The result q is calculated with one extra bit of precision (a guard bit). We check this bit and the remainder to round correctly, then shift right (q >> 1) to fit the final format.
- Check Remainder: If
ix0 | ix1is non-zero, the result was not a perfect square. - Round: We apply “Round to Nearest, Ties to Even” logic. (IEEE 754)
This is the primary advantage over Newton–Raphson-style approaches. By constructing the result one bit at a time and tracking the exact remainder, the algorithm can determine the correct final rounding unambiguously.
Newton–Raphson converges very quickly, but unless it is carried out with extra precision and careful rounding logic, it can produce last-bit (ULP) errors.
Pack: We take our calculated mantissa bits (q, q1), combine them with the halved exponent calculated in step 2, and reinterpret the bits back into an f64.
Why this works
This algorithm is preferred in libm because it guarantees correct rounding for the last bit. Newton-Raphson converges very quickly, but it can suffer from “off-by-one-ULP” (Unit in the Last Place) errors unless calculated with significantly higher precision than the target type. This bit-by-bit method is slower, but mathematically rigorous for integer-based floating point units.
By tracking s = 2q, the test “can I set this bit?” becomes a single comparison
with no multiplication. The inner loop uses only:
- Addition / subtraction
- Bit shifts
- Comparison
No division. No multiplication. 53 iterations, one bit each.
Source: sqrtd.rs from libmcs, a direct descendant of Sun's fdlibm (1993).
Our agents went above and beyond the extreme standards by efficiently surfacing subtle inconsistencies in the tests and verification process for human review.
As GTD's team notes:
“We have been very positively surprised by the fact that Asari AI has been able to come back to us with such in-depth comments and assessment about the obtained test results and slightest differences in numerical behavior." -GTD
Our agents found several previously undetected errors in the verification process. GTD highlighted one particularly tricky case: “Asari AI's agents were able to find a very subtle double error in our test suite which we never spotted throughout years of reviews."
This thoroughness in understanding tests and design choices proved essential for maintaining the rigorous correctness standards throughout the Rust translation.
The GTD team reflects on the broader significance: “Space software needs to be well understood, but requirements engineering, verification, and validation are open ended tasks. The additional help of AI in verification and validation processes can contribute to an increase in reliability, understanding, and level of trust in the software we use to fly."
Safe, not slow
When migrating software, retaining performance is often non-negotiable.
We measured performance of our Safe Rust programs across multiple dimensions: successful memory safe translation, equivalence testing, performance overhead, memory footprint changes, and tool runtime. In addition, we evaluated qualitative measures of code quality and whether the output is idiomatic Rust, not just syntactically correct code.
Besides being correct, our Safe Rust code has comparable performance as the original C code. We also believe that our agents can further optimize the Rust implementations to be even more performant than it is currently.
Safe, not slop
"I would probably be fooling myself if I could say that I could tell that this code was machine translated from C."
Besides ensuring correctness and good performance, our agents are notably much less prone to producing “AI slop”. In many cases, the Safe Rust code written by our agents reads like it was written by a skilled human engineer.
An external senior Rust developer noted that the Rust implementation of gzip felt like the work of a competent engineer: “It feels like work done by a Rust engineer with 3-5 years of experience. The code is actually well structured and components are well encapsulated.”
Importantly, our agents’ code lacks the tell-tale signs of machine generation: “I would probably be fooling myself if I could say that I could tell that this code was machine translated from C.” This idiomatic quality, not just syntactic correctness, is essential for maintainability and long-term adoption.
From faithful translations to full redesigns
One important decision is the degree of redesign. At two ends of the spectrum, either 1) produce faithful conversions that preserve C's structure, or 2) completely redesign the codebase and embrace Rust idioms. The degree of specification ambiguity shapes this choice; when specifications are less comprehensive, the agents have more freedom in their process. Our agents are flexible and can take either approach. In any case, the final result passes the same verification test suite.
The nature of these tradeoffs depends on the use case. For gzip, the specification is relatively straightforward: gzip is a command-line executable with well-defined input/output behavior that all implementations should adhere to. This makes the tradeoff between faithful translation and idiomatic redesign more explicit, since validation tests can be defined independent of how the internal code is structured.
Deep Dive: C to Rust translation approaches
C Original
static int
longest_match(IPos cur_match)
{
unsigned chain_length = max_chain_length; /* max hash chain length */
register uch *scan = window + strstart; /* current string */
register uch *match; /* matched string */
register int len; /* length of current match */
int best_len = prev_length; /* best match length so far */
IPos limit = strstart > (IPos)MAX_DIST ? strstart - (IPos)MAX_DIST : NIL;
/* Do not waste too much time if we already have a good match: */
if (prev_length >= good_match) {
chain_length >>= 2;
}
register uch *strend = window + strstart + MAX_MATCH;
register uch scan_end1 = scan[best_len-1];
register uch scan_end = scan[best_len];
do {
match = window + cur_match;
if (match[best_len] != scan_end ||
match[best_len-1] != scan_end1 ||
*match != *scan ||
*++match != scan[1]) continue;
scan += 2, match++;
do {
} while (*++scan == *++match && *++scan == *++match &&
*++scan == *++match && *++scan == *++match &&
*++scan == *++match && *++scan == *++match &&
*++scan == *++match && *++scan == *++match &&
scan < strend);
len = MAX_MATCH - (int)(strend - scan);
scan = strend - MAX_MATCH;
if (len > best_len) {
match_start = cur_match;
best_len = len;
if (len >= nice_match) break;
scan_end1 = scan[best_len-1];
scan_end = scan[best_len];
}
} while ((cur_match = prev[cur_match & WMASK]) > limit
&& --chain_length != 0);
return best_len;
}Characteristics: Global variables, pointer arithmetic (*++scan), register hints, unrolled 8x comparison loop, do-while, no bounds checking.
1-to-1 Translation
pub fn longest_match(
gzip: &crate::gzip::GzipState,
mf: &DeflateMatchFinderState,
deflate: &mut DeflateGlobals,
mut cur_match: IPos,
) -> usize {
let window = &gzip.window;
let prev = &gzip.prev;
let mut chain_length = deflate.max_chain_length;
let mut best_len = deflate.prev_length;
let strstart = deflate.strstart;
// Stop when cur_match becomes <= limit. Prevent matches with index 0.
let limit = strstart.saturating_sub(MAX_DIST);
// Do not waste too much time if we already have a good match.
chain_length = deflate.adjust_chain_length_for_prev_match(chain_length);
let scan0 = window[strstart];
let scan1 = window[strstart + 1];
let mut scan_end1 = window[strstart + best_len - 1];
let mut scan_end = window[strstart + best_len];
let strend = strstart + MAX_MATCH;
while cur_match > limit && chain_length != 0 {
let m = cur_match;
// Skip to next match if the match length cannot increase.
if window[m + best_len] != scan_end
|| window[m + best_len - 1] != scan_end1
|| window[m] != scan0
|| window[m + 1] != scan1
{
cur_match = usize::from(prev[m & WMASK]);
chain_length = chain_length.saturating_sub(1);
continue;
}
// Now compare the strings byte-by-byte.
let mut s = strstart + 2;
let mut p = m + 2;
while s < strend && window[s] == window[p] {
s += 1;
p += 1;
}
let len = s - strstart;
if len > best_len {
deflate.match_start = m;
best_len = len;
if len >= mf.nice_match {
break;
}
scan_end1 = window[strstart + best_len - 1];
scan_end = window[strstart + best_len];
}
cur_match = usize::from(prev[m & WMASK]);
chain_length = chain_length.saturating_sub(1);
}
best_len
}Characteristics: Direct port - same logic, multiple state structs replace globals (GzipState, DeflateMatchFinderState, DeflateGlobals), array indexing replaces pointers, retains IPos type alias, debug_assert! mirrors C's Assert.
Redesign with Rust Idioms
/// Find the longest match starting at the given hash chain head
#[inline]
pub fn longest_match(&mut self, cur_match: usize) -> usize {
let mut chain_length = self.max_chain_length;
let mut best_len = self.prev_length;
let limit = self.strstart.saturating_sub(MAX_DIST);
// Reduce chain search if we already have a good match
if self.prev_length >= self.good_match {
chain_length >>= 2;
}
// Handle edge case where best_len is 0
if best_len == 0 {
best_len = 1;
}
let scan_start = self.strstart;
let scan_end = (scan_start + MAX_MATCH).min(self.window.len());
if scan_start + best_len >= self.window.len() {
return best_len;
}
let mut scan_end1 = self.window[scan_start + best_len - 1];
let mut scan_end_byte = self.window[scan_start + best_len];
let mut cur_match = cur_match;
loop
{
let match_pos = cur_match;
// Quick rejection test using last bytes
if match_pos + best_len >= self.window.len() {
break;
}
if self.window[match_pos + best_len] != scan_end_byte
|| self.window[match_pos + best_len - 1] != scan_end1
|| self.window[match_pos] != self.window[scan_start]
|| self.window[match_pos + 1] != self.window[scan_start + 1]
{
cur_match = self.prev[cur_match & (WSIZE - 1)] as usize;
chain_length = chain_length.saturating_sub(1);
if cur_match <= limit || chain_length == 0 {
break;
}
continue;
}
// Count matching bytes
let mut len = 2;
let max_len = (scan_end - scan_start).min(MAX_MATCH);
while len < max_len && self.window[scan_start + len] == self.window[match_pos + len] {
len += 1;
}
if len > best_len {
self.match_start = match_pos;
best_len = len;
if len >= self.nice_match {
break;
}
if scan_start + best_len < self.window.len() {
scan_end1 = self.window[scan_start + best_len - 1];
scan_end_byte = self.window[scan_start + best_len];
}
}
cur_match = self.prev[cur_match & (WSIZE - 1)] as usize;
chain_length = chain_length.saturating_sub(1);
if cur_match <= limit || chain_length == 0 {
break;
}
}
best_len
}Characteristics: Self-contained DeflateState struct owns all data (window, hash tables, trees, bits), methods on impl DeflateState, no global/shared mutable state, infallible internal helper (returns direct value).
For the other libraries (libyaml, zlib, libmcs), the specification is more entangled with particular interfaces and public functions. Some interfaces are independent of the programming language, while others are specific to C or Safe Rust. Therefore, the specification depends on how idiomatic the Rust implementation should be, and feature parity and validation tests become much harder to define cleanly. As the Rust design moves away from a 1:1 translation, specification ambiguity increases and achieving correctness becomes much harder.
The final results show that our agents successfully navigated these tradeoffs, from faithful conversion to fundamental redesign, demonstrating capacity for rigorous system design in collaboration with humans.
Pushing the frontier of AI agents
Successfully translating large codebases from C to Safe Rust shows our agents are reliable across thousands of interdependent engineering decisions. This lets humans focus on the craft of system design while the agents build thoroughly tested artifacts.
In upcoming posts, we'll share technical deep dives into specific Safe Rust translations, exploring how our agents handled the unique challenges and architectural decisions of each codebase.
More broadly, our goal is to build versatile AI agents that can rigorously reason about complex engineering systems, continuously learn and make new discoveries, and amplify human capabilities by orders of magnitude. We believe that is the path to economically efficient invention at scale.
We're applying our agents to increasingly complex and diverse engineering challenges, and planning to make them more broadly available.
If this excites you, we’d love to talk and we’re hiring!