Impecfect notes on WebAssembly

Quick write down of my studies on wasm

The official website says:

Is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and serve apps.

Let’s break that in parts:

You don’t have to know how to create web assembly code. The wasm modules can be imported into a web app, exposing function to be used by javascript. This can lead to massive performance boosts and new functionalities, while still easy for developers to work with.

The web can be divided in two parts:

Before, the VM could only load javascript. But now, it can load Wasm too.

JS and Wasm can interact between them. There is a WebAssembly Javascript API so we can call Wasm code using javascript functions (it’s a wrapper of Wasm code). We also can import javascript code into Wasm too!

WASM key concepts:

We can created all of the above with the Javascript API. Also, Javascript can synchronously call the wasm exports, which are just javascript functions, and wasm can synchronously call javascript functions passed as imports to a wasm instance.

Javascript can control how wasm is downloaded, compiled and run, so we can think of wasm as another javascript feature.

Important to note that today, wasm cannot call directly the DOM APIs, it need to tell javascript and then JS will make the call.

According to MDN page, there are 4 ways to start using wasm in a web app:

Loading and running WASM code

To use Wasm in Javascript, we need to pull the wasm module into memory (fetch) before compilation/instantiation

The old way was to use WebAssemblye.compile / WebAssembly.instantiate. We needed to fetch the wasm bytes, create an ArrayBuffer containing the binaries of wasm and then compile/instantiate.


  fetch("module.wasm")
    .then((response) => response.arrayBuffer())
    .then((bytes) => WebAssembly.instantiate(bytes, importObject))
    .then((results) => {
      // Do something with the results!
    });

Now the new way is to use WebAssemblye.compileStreaming / WebAssembly.instantiateStreaming, which performs the actions directly on the raw stream of bytes coming from the network. No need to create an ArrayBuffer.


  var importObject = { imports: { imported_func: (arg) => console.log(arg) } };

  WebAssembly.instantiateStreaming(fetch("simple.wasm"), importObject).then(
    (obj) => {
      // call an exported function
      obj.instance.exports.exported_func();

      // or access the buffer contents of an exported memory:
      var i32 = new Uint32Array(obj.instance.exports.memory.buffer);

      // or access the elements of an exported table:
      var table = obj.instance.exports.table;
      console.log(table.get(0)());
    }
  );

In the case above, importObject is an object containing the values to be imported into the new wasm instance. Can be functions or other objects.

the results from both ways above are an object with the module and instance:


  {
    module: Module, // The newly compiled WebAssembly.Module object,
    instance: Instance // A new WebAssembly.Instance of the module object
  }

Remember is the Instance we want to use in javascript to run our wasm code. The module is the binary representation, and it’s valuable for caching, sharing with another worker or browser window using postMessage(), or to create more Instances.

Instead of using fetch, we could also use a XMLHttpRequest approach

WebAssembly text format

The text format is an intermediate format between the binary and the javascript objects.

As we said, the most basic unit of wasm is a module. In the text format, a module is represented by one big S-expression. S-expressions are very simple text formats representing a tree, in this case, a tree of nodes. Unlike the AST for programming languages, wasm tree are flat and consist of a list of instructions.

In S-expressions, each node in the tree goes between parentheses:


  (module (memory 1) (func))

The first label is the type of node, so module. Then the following nodes are separated by space and are the child nodes. One child node memory with parameter 1 and another child func

All code inside a wasm module is grouped into functions. These function have the following structure:

( func <signature> <locals> <body> )

It’s kind of similar to functions in other languages

Signature and parameters

Each parameter has a explicit type. Wasm has currently 4 available number types:

A param is declared like (param i32) and the return type is (result f64). So a binary function with two parameters and a return type would be:


  (func (param i32) (param i32) (result f64) ...)

the result can be omited. It means the function dont have a return value

After the signature, the locals are listed with their type, like (local i32).

Parameters are basically locals initalized with the correspondent passed argument.

Getting and setting locals and parameters

Locals and parameters can be written/read by the body of the function with local.get and local.set instructions.

They refer to the numeric index of the item. Remember parameters are a list, ordered by declaration. And they are followed by the list of locals, also ordered by declaration.

So:


  (func (param i32) (param f32) (local f64)
    local.get 0
    local.get 1
    local.get 2)

Above, local.get 0 will get the “param i32”, local.get 1 would get “f32” and local.get 2 would get “local f64”

Using numeric indexes can be confusing and annoying, so the text format allow you to declare a name to parameters, locals and other items by incluing a prefixed $ before the type declaration:

(func (param $p1 i32) (param $p2 f32) (local $loc f64) ...)

Then to get these values simply:

local.get $p1 for example

Stack Machines

Before we can write the function body, we need to talk about stack machines.

Wasm execution is defined in stack machines, where basically every instruction pushes and/or pops a certain number of values (i32/i64/f32/f64) to/from a stack.

For example:

When a function is called, the stack begin empty and is gradually filled up or emptied as the body instructions are executed.

The following function:


  (func (param $p i32)
    (result i32)
    local.get $p
    local.get $p
    i32.add)

Will contain exacly one i32 value in the stack, at the end of execution. It is the result of ($p + $p), which is the i32.add expression. Wasm has a validation rule to ensure the result type is exacly the final value in the stack. If there is no result type, the stack must be empty.

Our fisrt function body

The function body is just a series of instructions that are followed through execution. Let’s define a module containing our simple function:


  (module
    (func (param $lhs i32) (param $rhs i32) (result i32)
      local.get $lhs
      local.get $rhs
      i32.add))

This function gets two parameters, adds them together and returns the result.

Now we need to call this function!

Like locals and params, the function has a index by default, but we can also assign names for them.

And just like ES2015, we need to export the function declaration inside the module.


  (module
    (func $add (param $lhs i32) (param $rhs i32) (result i32)
      local.get $lhs
      local.get $rhs
      i32.add)
    (export "add" (func $add))
  )

The exported “add” is the name of the function we can call inside the javascript, whereas $add is the wasm function name.

Exploring fundamentals

Calling other functions from the same module in WASM

We use the call instruction for it, using the function index or name to be called.


  (
    module
      (func $getAnswer (result i32)
        i32.const 42)
      (func (export "getAnswerPlus1") (result i32)
        call $getAnswer
        i32.const 1
        i32.add))

Above we have two function in the same module. The function $getAnswer just return a i32 number, in this case 42. The other function is exported and also return a i32 result. But it also call the getAnswer function, declare a i32 const with value 1 and adds them together. And return the add result.

i32.const 42 just defines a 32-bit integer and pushes to the stack, with the value declared in front of const. We could change the type or the value, to match what we want.

Also in the example above, we use another syntax to export a function. func (export "getAnswerPlus1") is a shorthand for export function, but is the same as:

(export "getAnswerPlus1" (func $functionName))

Importing function from Javsacript

We know that Javascript can call Wasm function like:


  WebAssembly.instantiateStreaming(fetch("call.wasm")).then((obj) => {
    console.log(obj.instance.exports.getAnswerPlus1()); // "43"
  });

But what about Wasm calling javascript functions? Wasm don’t have actual knowledge of javascript, but it has a way to import functions.


  (module
    (import "console" "log" (func $log (param i32)))
    (func (export "logIt")
      i32.const 13
      call $log))

The import statement above is saying to import a function called log from the console module. In the exported function “logIt” we are calling the imported function, which we named “$log”.

Any javascript function can be passed! Once we define a import statement in wasm, the WebAssembly.instantiate() function should define an object as second parameter, which we can call importObject. This object has to have the corresponding properties that our import statement in wasm expects:


  var importObject = {
    console: {
      log: function (arg) {
        console.log(arg);
      },
    },
  };

  WebAssembly.instantiateStreaming(fetch("logger.wasm"), importObject).then(
    (obj) => {
      obj.instance.exports.logIt();
    }
  );

Declaring globals in WASM

We can create global variables acessible from both javascript and importable/exportable wasm modules. This is very useful for allowing dynamic linking between multiple modules.


  (module
    (global $g (import "js" "global") (mut i32))
    (func (export "getGlobal") (result i32)
          (global.get $g))
    (func (export "incGlobal")
          (global.set $g
              (i32.add (global.get $g) (i32.const 1))))
  )

Above, we are declaring a global $g which is defined as import, from “js” “global”. We also define the type as i32 and the mut keyword, meaning this global variable is mutable.

The “js” “global” we are importing from is declared in javascript using the WebAsssembly.Global() function:

const global = new WebAssembly.Global({value: "i32", mutable: true}, 0);

And in Javascript we can do:


  const global = new WebAssembly.Global({ value: "i32", mutable: true }, 0);

  var importObject = {
    js: {
      global,
    },
  };

  WebAssembly.instantiateStreaming(fetch("mywasm.wasm"), importObject).then(
    (obj) => {
      const result = obj.instance.exports.getGlobal();
      console.log(result);

      obj.instance.exports.incGlobal();
      console.log(global.value);
    }
  );

WebAssembly Memory

What if we want to work with strings or other data types in wasm? We can use Memories for that! (or the new Reference types). But talking about memory, memory is just a large array of bytes that can grow over time.

Wasm has instructions like i32.load or i32.store for reading and writing from linear memory.

If we think about Javascript POV, memory is a big ArrayBuffer. So a string in just a sequence of bytes somewhere inside this linear memory.

Let’s assume we wrote a string into memory. How we pass that to Javascript? Javascrip can access that via WebAssembly.Memory() which can access existing memory (we can only have one memory per module) and use the methods associated. There is, for example, a buffer getter that returns an ArrayBuffer of the whole memory.

We can also use Memory.grow() to grow the memory. But since ArrayBuffer cannot change size, it creates a new ArrayBuffer pointing to the newer and bigegr memory.

To pass a string to javascript, we need to declare the offset (position) of the string in memory, as well as the length of it (to know where the string ends in the linear memory)

There are ways to encode a string length into the string itself (eg. C strings) but let’s pass the offset and length as parameters:


  (import "console" "log" (func $log (param i32) (param i32)))

In Javascript, we can now use TextDecorator to decode our bytes into a javascript string


  function consoleLogString(offset, length) {
    var bytes = new Uint8Array(memory.buffer, offset, length);
    var string = new TextDecoder("utf8").decode(bytes);
    console.log(string);
  }

The last thing to do is make our consoleLogString function get access to the wasm memory. We could do it two ways:

Let’s create in javascript and import it into webassembly:


  // in JS
  var memory = new WebAssembly.Memory({initial:1});

  // in WASM
  (import "js" "mem" (memory 1))

The memory 1 indicates the imported memory have at least 1 page of memory (64kb)

Our final Wasm looks like:


  (module
    (import "console" "log" (func $log (param i32 i32)))
    (import "js" "mem" (memory 1))
    (data (i32.const 0) "Hi")
    (func (export "writeHi")
      i32.const 0  ;; pass offset 0 to log
      i32.const 2  ;; pass length 2 to log
      call $log))

Since we are writing our own assembly, we are writing the string content into global memory using the data. It allows a string of bytes to be written at a given offset at instantiation time.

data can be used to initialize regions of linear memory with bytes.

If we were compiling a C program, we would just call a function to allocate some memory for the string.

The ;; are comments in wasm

Now our javascript is:


  var memory = new WebAssembly.Memory({ initial: 1 });

  var importObject = { console: { log: consoleLogString }, js: { mem: memory } };

  WebAssembly.instantiateStreaming(fetch("logger2.wasm"), importObject).then(
    (obj) => {
      obj.instance.exports.writeHi();
    }
  );

This results in “Hi” being written in the console

WebAssembly Tables

Tables are basically resizeble arrays of references that can be accessed by index from WebAssembly code.

Remeber the call instruction? We can only call one single static function with it, but what about calling a runtime value?

In Javascript functions are first-class values, in C we can use pointers to reference a function.

In Wasm we store funcion references in a table, and pass around the table indices, which are just i32 values. So now, we can use the call_indirect instruction (which calls dynamic functions), passing simply a i32 index value.

Defining a table

elem can be used to initialize regions of tables with functions


  (module
    (table 2 funcref)
    (elem (i32.const 0) $f1 $f2)
    (func $f1 (result i32)
      i32.const 42)
    (func $f2 (result i32)
      i32.const 13)
    // ...
  )

Explaining above:

In Javascript, to create a similar table we would do:


  function() {
    // table section
    var tbl = new WebAssembly.Table({initial:2, element:"funcref"});

    // function sections:
    var f1 = ... /* some imported WebAssembly function */
    var f2 = ... /* some imported WebAssembly function */

    // elem section
    tbl.set(0, f1);
    tbl.set(1, f2);
  };

Using the table


  (type $return_i32 (func (result i32)))
  (func (export "callByIndex") (param $i i32) (result i32)
    local.get $i
    call_indirect (type $return_i32))

The call_indirect implicity pops the $i value from the stack, but we also could define explicity like this:

(call_indirect (type $return_i32) (local.get $i))

But wait, what connects our call_indirect with our table? Right now Wasm only supports one table per module, so call_indirect is implicity calling it.

The full module would be:


  (module
    (table 2 funcref)
    (func $f1 (result i32)
      i32.const 42)
    (func $f2 (result i32)
      i32.const 13)
    (elem (i32.const 0) $f1 $f2)
    (type $return_i32 (func (result i32)))
    (func (export "callByIndex") (param $i i32) (result i32)
      local.get $i
      call_indirect (type $return_i32))
  )

and the javascript to call the function is:


  WebAssembly.instantiateStreaming(fetch("wasm-table.wasm")).then((obj) => {
    console.log(obj.instance.exports.callByIndex(0)); // returns 42
    console.log(obj.instance.exports.callByIndex(1)); // returns 13
    console.log(obj.instance.exports.callByIndex(2)); // returns an error, because there is no index position 2 in the table
  });

Mutating tables and dynamic linking

Javascript has full access to function references, so the Table Object can be mutated from javascript with grow(), get() and set(). We could also manipulate the table in Wasm using table.set and table.get.

We can use dynamic linking schemes, where multiple instances share the same memory and table. This leads to great performance.

shared0.wat


  (module
    (import "js" "memory" (memory 1))
    (import "js" "table" (table 1 funcref))
    (elem (i32.const 0) $shared0func)
    (func $shared0func (result i32)
    i32.const 0
    i32.load)
  )

We have a single import object called “js”, containing a Memory and Table objects. We will pass this same import object to multiple instantiate calls.

shared1.wat


  (module
    (import "js" "memory" (memory 1))
    (import "js" "table" (table 1 funcref))
    (type $void_to_i32 (func (result i32)))
    (func (export "doIt") (result i32)
    i32.const 0
    i32.const 42
    i32.store  ;; store 42 at address 0
    i32.const 0
    call_indirect (type $void_to_i32))
  )

How it works:

Again, we are using implicit expression to pop values from the stack. We could do explicity like:


  (i32.store (i32.const 0) (i32.const 42)) ;; puts value 42 into index 0
  (call_indirect (type $void_to_i32) (i32.const 0)) ;; call function at index 0 of table

In the javascript, we use both wasm files:


  var importObj = {
    js: {
      memory: new WebAssembly.Memory({ initial: 1 }),
      table: new WebAssembly.Table({ initial: 1, element: "funcref" }),
    },
  };

  Promise.all([
    WebAssembly.instantiateStreaming(fetch("shared0.wasm"), importObj),
    WebAssembly.instantiateStreaming(fetch("shared1.wasm"), importObj),
  ]).then(function (results) {
    console.log(results[1].instance.exports.doIt()); // prints 42
  });

Each module being compiled can share the same linear memory and table address.