We're hiring!
*

Persian Rug, Part 2 - Other ways to make object soups in Rust

Edmund Smith avatar

Edmund Smith
September 27, 2023

Share this post:

Reading time:

In the first part of this series, we looked at a basic pattern, where two types of objects refer to one another, and claimed that it is hard to model in Rust. Patterns like these are common when modeling things from the real world. In this part we'll follow up in more detail and discuss a wider range of approaches.

Why is this hard?

We can solve problems like this easily in C++, it seems clear and straightforward how to do so. Let's go back to the C++ example we gave:

struct group;
struct user {
    std::vector<std::shared_ptr> groups;
};
struct group {
    std::optional<std::shared_ptr> leader;
};

Why doesn't a Rust analogue of this exist? Look at this snippet of compilable C++ which uses these types:

auto u = std::make_shared();
auto g1 = std::make_shared();
u->groups.push_back(g1);
auto & first_group = u->groups[0];
auto g2 = std::make_shared();
u->groups.push_back(g2);
first_group->leader = u;

First we take a reference to g1 via the shared_ptr held inside u, and call it first_group. Then we push to the vector of groups inside u, which might reallocate. Now our reference to first_group may dangle, and then using it on the next line will crash (or even worse, corrupt memory). Rust will not allow this to happen: mutable references to data are only ever available when no other references exist, so the call to push_back on line 6, requiring mutable access, would not compile.

But when we interact with an object in a soup, each object is likely to have at least one other handle (something like a reference or a pointer) pointing to it from other objects. That in turn makes it very difficult for Rust to provide a mutable reference to the object, because nothing prevents you from obtaining a second reference to it from one of the other handles.

Approaches

Use shared references

As we showed in the previous part, you could use std::sync::Arc` as the closest correlate of C++'s std::shared_ptr:

use std::sync::Arc;
struct User {
    groups: Vec<Arc<Group>>,
}
struct Group {
    leader: Option<Arc<User>>,
}

Using Arc on its own, we can represent immutable trees of objects, provided we construct the nodes in the right order, because making trees doesn't require previously constructed objects to be edited. With care we can do slightly more, because there's the possibility of mutation of an object when only one Arc pointing to that object exists.

Decimating a graph into a frozen DAG


So, to use Arc, we may have to remove some links between objects, like removing the leader field in Group for example. The more links we try to save, the more careful we'll have to be about the order of construction. In the end, we'll still have a largely immutable graph, and no amount of care will allow us to represent everything we might like.

Duplicate your data

Trees of objects are far more amenable to the sort of safety guarantees that Rust provides. There's one clear owner of a value, and you can only access data through a reference to that value. So we could write:

struct User {
    groups: Vec<Group> 
}
struct Group {
    leader: Option<User>
}

Now we have expanded our graph into a tree, by duplicating Users and Groups as needed. Mutation is allowed, but different copies may skew.

Use Mutexes

In Rust, a Mutex permits converting a shared reference to _itself_ into a mutable reference to its _contents_. A Mutex will block the thread rather than return a mutable reference when other references exist. This also explains why a Rust Mutex is not reentrant: if it were, we could create dangling references.

struct User {
    groups: Vec<Arc<Mutex<Group>>>,
}
struct Group {
    user: Option<Arc<Mutex<User>>>
}

This approach gives a fully mutable collection of objects, with arbitrary links between them. There will be no unchecked memory errors. So why not use this?

Let's imagine traversing our collection. To look at each node's contents, even to read it, we need to obtain its mutex. That means during traversal we'll build up a chain of mutexes that we hold. If we visit the same node twice, we'll get deadlock. If we have two threads traversing the collection and they obtain locks in different orders, we'll get deadlock. What's more, in a soup there is no obvious lock order we can try to maintain.

Try interior mutability

The final basic approach to this problem is to use *interior mutability* via `RefCell`. If you think of `Mutex` blocking threads to enforce Rust's safety rules, then `RefCell` crashes your program to achieve the same thing. The code might look like this:


struct User {
    groups: Vec<Rc<RefCell<Group>>>,
}
struct Group {
    user: Option<Rc<RefCell<User>>>
}

Just like with a Mutex, with RefCell we can ask for references to the data wherever we wish, and the code will compile. But if we ask for a mutable reference to an object when there are other references in existence, our program will panic. Like everything well-formed in Rust, RefCell cannot permit a mutable reference to data to co-exist with other references to that data.

Once again, there will be no unchecked errors. But now we are responsible for ensuring that we never try to create a mutable reference to data that is referenced elsewhere. This will not be checked for us by the compiler. We have again committed ourselves to a difficult contract to follow in our code.

Next

In the next part, we'll look in more detail at persian-rug: how it works and the trade-offs it offers at present. After that, we'll talk about ways to improve upon its limitations in the future.

Comments (0)


Add a Comment






Allowed tags: <b><i><br>Add a new comment:


Search the newsroom

Latest Blog Posts

Re-converging control flow on NVIDIA GPUs - What went wrong, and how we fixed it

25/04/2024

While I managed to land support for two extensions, implementing control flow re-convergence in NVK did not go as planned. This is the story…

Automatic regression handling and reporting for the Linux Kernel

14/03/2024

In continuation with our series about Kernel Integration we'll go into more detail about how regression detection, processing, and tracking…

Almost a fully open-source boot chain for Rockchip's RK3588!

21/02/2024

Now included in our Debian images & available via our GitLab, you can build a complete, working BL31 (Boot Loader stage 3.1), and replace…

What's the latest with WirePlumber?

19/02/2024

Back in 2022, after a series of issues were found in its design, I made the call to rework some of WirePlumber's fundamentals in order to…

DRM-CI: A GitLab-CI pipeline for Linux kernel testing

08/02/2024

Continuing our Kernel Integration series, we're excited to introduce DRM-CI, a groundbreaking solution that enables developers to test their…

Persian Rug, Part 4 - The limitations of proxies

23/01/2024

This is the fourth and final part in a series on persian-rug, a Rust crate for interconnected objects. We've touched on the two big limitations:…

Open Since 2005 logo

We use cookies on this website to ensure that you get the best experience. By continuing to use this website you are consenting to the use of these cookies. To find out more please follow this link.

Collabora Ltd © 2005-2024. All rights reserved. Privacy Notice. Sitemap.