Can we use Wasmtime to build web script engine?

Wu Yu Wei published on
12 min, 2221 words

A few years ago, there was a discussion around Servo community about using alternative Javascript engine instead of SpiderMonkey. While Servo has a solid bindings to SpiderMonkey, we have to deal with several issues along with it. Servo creates its own fork called mozjs with patches it needs. At that time, people were wondering if there could be a better option. Or, even better, to see if Servo can make script engine pluggable and switch to others like V8. But in the end, we stick to existing choice and keep maintaining mozjs.

For me, I had a wilder idea in my mind back then: What if we could use a Webassembly runtime directly to build a script engine? I know mojarity of browsers can load Webassembly modules already. However, they can't access to most DOM objects and WebAPIs directly. You will need to add bindings to Webassembly modules to acutally obtain those features. What if there's a script engine built on top of Webassembly directly? This way, all languages could be the first class citizen when using Web browsers!

Unfortunately, at that time, I found out Webassembly hasn't completed the spec of GC yet. It is necessary for building DOM node tree due to its design nature. DOM objects can reference each other pretty easily. It's pretty common to see a node link to its child, while its child link to its parent. So we must have a garbage collector that can trace these objects. So eventually, I kept the idea for future me to worry.

And a couple of weeks ago, I found that Wasmtime has completed the Wasm GC support already! So I believe it's time to explore this idea again. In this post, I'll share my current progress, explain possible design, and what's next for future roadmap.

DOM Objects in Wasm GC

If you want to understand Wasmtime's GC types in details, Their RFC repository has a whole page to explain them exhaustively. What we want to know here is what will the definition of DOM object look like when using these GC types. And the choice is its reference types, more precisely, ExternRef type. The usage of ExternRef is pretty straight forward. We call the new method with provided value and we will get Rooted<ExternRef>. The only problem is the type annotation will be eliminated upon ExternRef creation. You can see this type doesn't hold any generic. So in my attempt, I create a new type for it to contain the type information along with several utility methods:

#[derive(Copy, Debug)]
pub struct Object<T: 'static + Any + Send + Sync> {
    object: Rooted<ExternRef>,
    _phantom: PhantomData<T>,
}

Note that we can still access Rooted<ExternRef> through dereference. But this type helps us understand underlying implementation when defining new DOM objects. We will see more examples with it in later section of the post.

Reflector in Wasm Component

Before we can dive deeper into DOM object implementation details, we need to determine how reflector in Wasm works. When a browser creates a DOM object, it needs a reflector object that can be used in script side. For Servo, you can read its script crate's documentation to understand how Servo uses SpiderMonkey's types to achieve it. For Wasm, each language usually provide codegen or binding libraries to help us instantiate the Wasm module. Take Rust's wasm-bindgen as an example. When we build a wasm module through wasm-bindgen, it will also generate Javascript/Typescript bindings script to instantiage the module. But there's a major issue, these binding codes can vary between different libraries. And this kind of defeat the purpose to make every language as first class citizen of the script engine. We can't just choose certrain framework to define the reflector bindings.

Fortunately, this seems to be what WebAssembly community faces as well, and they propose another concpet called Wasm Component Model. You can think it's like another kind of Wasm module, but it has a pre-defined set called WIT(Wasm Interface Type). Let's say we want to define Document WebIDL interface. We can define it through resource type. This is a type that only exposes behaviour through methods. We can think it as a object that implement an interface. To define Document in resource type, it can be listed as following:

resource node {
    append-child: func(child: node) -> node;
}

resource document {
    constructor();
    url: func() -> string;
    document-element: func() -> option<element>;
}

resource element {
    has-attributes: func() -> bool;
}

Since this is for demo purpose, I won't explain all concepts of WIT and every steps on how to use it. Instead, I'll only list out the necessary methods that could explain our proof of concept. In this example, we define resources node, document and element. They are corresponding to Node, Document and Element as DOM objects. I would like to showcase that once we get a docuemnt, we can use it to call several methods, including methods that can get another DOM objects linked to it. The document-element will return option<element> which element is another resource represent as another DOM object. Next, we will demonstrate how to implement the detail in Host side, and then use it in the script side. Or in Wasm's term, script side is usually called Guest side.

Implement Resources in Host Side

To get the bindings from the host side, Wasmtime provides bindgen! macro to generate the code based on provided WIT files. What we care the most is what are the exact type of each resource we will define. Let's we already have a few types called Node, Document, Element (We will further explain their details later). They can be specified in the macro as following:

wasmtime::component::bindgen!({
    path: "wit",
    with: {
        "ohim:dom/node/node": Node,
        "ohim:dom/node/document": Document,
        "ohim:dom/node/element": Element,
    },
    trappable_imports: true,
});

"ohim:dom/node/node" is the full path to node resource we defined in WIT file. In the bindgen macro, we specify the exact implementation type of document is Document. And then the macro will generate a HostDocument trait with all related method defined. Our job is to fulfill all these resource traits. Now, it's time to see what's inside of the DOM objects and how do we implement to those resource traits.

Node and NodeImpl

As we mentioned above, we intend to use Object<T> to define DOM Object. For Node type, it will basically be a new type of Object<T> with NodeImpl as the actual data:

pub struct Node(Object<NodeImpl>);

pub struct NodeImpl {
    event_target: EventTarget,
    parent_node: Option<Node>,
    first_child: Option<Node>,
    last_child: Option<Node>,
    next_sibling: Option<Node>,
    pub(crate) data: NodeTypeData,
}

Let's focus on fields related to Node and itself first. Node is a GC object which can be created by Object::new. And this method is essentially a ExterRef::new. All the implement details are in NodeImpl. Node will become an opaque GC rooted reference to the public, so users can have confident to link it to anywhere without worrying cicular reference. You can see NodeImpl's node fields like parent_node would take a Option<Node>. When we implement the methods, we just need to get the NodeImpl data from one Node and link the other one to it.

Inherent Node

Next, let's talk about inherence. We know Document and Element need to inherent Node, but neither Rust nor Wasm component have builtin support to it. I've thought about similar casting mechanism like Servo did based on SpiderMonkey. However, it resolves with tons of unsafe usage and it still bothers us when we need to implement some complicated types in script crate. Wasmtime already provide pretty safe and ergonomic APIs to use. I feel like we should chase for same direction as well. So I decided to make types who inherent ancestor type as a sum type which is the NodeTypeData listed in NodeImpl's definition:

pub enum NodeTypeData {
    /// `ELEMENT_NODE`
    Element(ElementImpl),
    /// `DOCUMENT_NODE`
    Document(DocumentImpl),
    /// Similer to `Option::None`.
    None,
}

With From trait implements to related DOM Object new types and methods that are exclusive to these types to get the deeper Impl types, it should be possible for a DOM Object casts to correct type and use the right implmentation data.

Implement resource traits

Now we have explained the definition of DOM Objects and their implementation details. It's time to implement to resource traits generated from bindgen!, so the Guest script side can actually use these types. Let's use HostDocument to demonstrate because it can showcase how to implement the constructor and also showcase how to link an object to the other:

/// `Store` states to use when `[Exposed=Window]`
pub struct WindowStates {
    table: ResourceTable,
    ctx: WasiCtx,
    store: Store<()>,
}

impl HostDocument for WindowStates {
    fn new(&mut self) -> Result<Resource<Document>> {
        // This is only for demo purpose
        let element = Element::new(&mut self.store)?;
        let document = Document::new(&mut self.store)?;
        document
            .data_mut(&mut self.store)
            .as_document_mut()
            .document_element = Some(element);

        Ok(self.table.push(document)?)
    }

    fn drop(&mut self, rep: Resource<Document>) -> Result<()> {
        self.table.delete(rep)?;
        Ok(())
    }
    // ...other methods
}

To implement traits, we need to define a state data that can put into Wasmtime's Store. I found this is perfect match to WebIDL's extend attribute Exposed as we can decide which WebIDL interfaces can expose to certain target. In this example, we want to expose to Window.

If you look at the signature of new and drop, you can see what Wasm component provides to guest side isn't the actual object. Instead, it returns a Resource<T>. You can think this type as a handle. We will put the actual data that say Document into ResourceTable, and it will return Resource<Document>. And the guest will use it to call all methods realted to this resource. For example, if the guest want to call document-element, it will use this handle to call the method. What we need to do is implement the trait method to get the actual object from the table, get the Element object it wants, and then return the Resource<Element> handle it needs:

fn document_element(&mut self, self_: Resource<Document>) -> Result<Option<Resource<Element>>> {
    let self_ = self.table.get(&self_)?;
    match self_.document_element(&self.store) {
        Some(e) => Ok(Some(self.table.push(e)?)),
        None => Ok(None),
    }
}

With all necessary resource traits implemented, it's now possible to create a wasm component and use these host resources! There's cargo-component tool that can compile your crate into a Wasm component. In the code, we just need to use wit_bindgen to get all the WIT bindings we need:

// cargo component build
wit_bindgen::generate!({
    path: "../wit/",
    world: "imports",
});

struct GuestComponent;
export!(GuestComponent);
impl Guest for GuestComponent {
    fn test() -> String {
        let document = dom::node::Document::new();
        let element = document.document_element();
        format!(
            "Document has url: {} with element has attributes: {}",
            document.url(),
            element.unwrap().has_attributes()
        )
    }
}

EventTarget As a field

Last, we need to talk about EventTarget. If you look at the definition of NodeImpl mentioned above. you can see I treat EventTarget as a field instead of making all other Impl types into a sum type of EventTarget. The reason of this choice is because there are hundreds of WebIDL interfaces inherent EventTarget. The main reason they need EventTarget is because they can be added to EventListner. Otherwise, they don't share any similarity with each other. So I decided to make it as a field instead to prevent endless casting starting from it. Node, on the other hand, has more frequent requirement to update its Node tree. Many interfaces who inherent it will usually edit its node pretty oftehn. So I think defining sum type for it is more justified. EventTarget will probably be the only special case here.

There's also another issue that WIT spec currently hasn't support function callback yet. The closest thing is scoped callbacks mentioned from the link, but it's still far away. So right now, I define EventListner as good old FnMut that can import/export to Wasm module. Hopefully, WIT reference can be expanded in the near future. Or perhaps Wasm component can get some features from Wasm module that can pass func directly.

Next steps

So that's all the code I'd like to explain. Full example can be viewed in Ohim repository. For the next steps, I wish to collect feedback and starting to evolve current example into full Node tree. Either it can be an alternative script backend or not, I plan to reuse as many Servo components as possible. Once we have minimum functioning script engine, I would try to pass the tree to Servo's layout. And hopefully, it can generate the display list and ask Servo's webrender to render the webpage! The road is still far far away. But it seems to be the time to kick-start the journey!