Impecfect notes on WebAssembly

Quick write down of my studies on wasm

The official website says:

Is a binary instruction format for a stack-based virtual machine. Wasm is designed as a portable compilation target for programming languages, enabling deployment on the web for client and serve apps.

Let’s break that in parts:

virtual machine: is software that mimics and provide the same functionality of a physical computer. It creates an environment to run applications and operating systems, it has it’s own CPU, memory, network interface and storage. VMs live inside a host computer (the actual physical machine) and it get the resources from the host.
portable compilation target: wasm is a new type of code that can run in modern browsers. It is not intended to be written by hand, but rather it is designer to be a compilation target for low-level languages such as C, C++, Rust…

You don’t have to know how to create web assembly code. The wasm modules can be imported into a web app, exposing function to be used by javascript. This can lead to massive performance boosts and new functionalities, while still easy for developers to work with.

The web can be divided in two parts:

VM that run the web app code, such as javascript
Web APIs the web app can call to control functionality (DOM, CSSOM, WebGL, IndexedDB, etc..)

Before, the VM could only load javascript. But now, it can load Wasm too.

JS and Wasm can interact between them. There is a WebAssembly Javascript API so we can call Wasm code using javascript functions (it’s a wrapper of Wasm code). We also can import javascript code into Wasm too!

WASM key concepts:

Module: is a WebAssembly binary that has been compiled by the browser into executable machine code.
Memory: A resizeble ArrayBuffer that contains the array of bytes read and written by webasssembly’s memory acces instructions
Table: A resizeble typed array of references (to function for example) that could not be stored as raw bytes by the memory (for safety and portability)
Instance: is a Module paired with all states it uses in runtime, including Memory, Table and imported values. Instance is like a ES2015 module that has been loaded.

We can created all of the above with the Javascript API. Also, Javascript can synchronously call the wasm exports, which are just javascript functions, and wasm can synchronously call javascript functions passed as imports to a wasm instance.

Javascript can control how wasm is downloaded, compiled and run, so we can think of wasm as another javascript feature.

Important to note that today, wasm cannot call directly the DOM APIs, it need to tell javascript and then JS will make the call.

According to MDN page, there are 4 ways to start using wasm in a web app:

Port C/C++ with Emscripten, a compiler of C/C++ to .wasm module and javascript “glue code”. The glue code makes possible for web assembly to interact with the browser APIs through javascript. For example, we have libs like SDL, OpenGL, OpenAL written in C/C++ that are implemented in terms of web APIs, so they require javascript to be called. The glue code implements the functionality of each lib used by C/C++, and also contains the logic to fetch, load and run .wasm files.
Writing or generating Wasm directly ate the assembly level. If you want to build your own compiler or tools, we can use the text representation of the WebAssembly binary format (they have a 1:1 correspondence). We can write the text assembly by hand and then generate the binary representation, using some tools.
Writing Rust and targeting WebAssembly as the output
Using AssemblyScript, which is similar to Typescript and compiles to WebAssembly

Loading and running WASM code

To use Wasm in Javascript, we need to pull the wasm module into memory (fetch) before compilation/instantiation

The old way was to use WebAssemblye.compile / WebAssembly.instantiate. We needed to fetch the wasm bytes, create an ArrayBuffer containing the binaries of wasm and then compile/instantiate.


  fetch("module.wasm")
    .then((response) => response.arrayBuffer())
    .then((bytes) => WebAssembly.instantiate(bytes, importObject))
    .then((results) => {
      // Do something with the results!
    });

Now the new way is to use WebAssemblye.compileStreaming / WebAssembly.instantiateStreaming, which performs the actions directly on the raw stream of bytes coming from the network. No need to create an ArrayBuffer.


  var importObject = { imports: { imported_func: (arg) => console.log(arg) } };

  WebAssembly.instantiateStreaming(fetch("simple.wasm"), importObject).then(
    (obj) => {
      // call an exported function
      obj.instance.exports.exported_func();

      // or access the buffer contents of an exported memory:
      var i32 = new Uint32Array(obj.instance.exports.memory.buffer);

      // or access the elements of an exported table:
      var table = obj.instance.exports.table;
      console.log(table.get(0)());
    }
  );

In the case above, importObject is an object containing the values to be imported into the new wasm instance. Can be functions or other objects.

the results from both ways above are an object with the module and instance:


  {
    module: Module, // The newly compiled WebAssembly.Module object,
    instance: Instance // A new WebAssembly.Instance of the module object
  }

Remember is the Instance we want to use in javascript to run our wasm code. The module is the binary representation, and it’s valuable for caching, sharing with another worker or browser window using postMessage(), or to create more Instances.

Instead of using fetch, we could also use a XMLHttpRequest approach

WebAssembly text format

The text format is an intermediate format between the binary and the javascript objects.

As we said, the most basic unit of wasm is a module. In the text format, a module is represented by one big S-expression. S-expressions are very simple text formats representing a tree, in this case, a tree of nodes. Unlike the AST for programming languages, wasm tree are flat and consist of a list of instructions.

In S-expressions, each node in the tree goes between parentheses:


  (module (memory 1) (func))

The first label is the type of node, so module. Then the following nodes are separated by space and are the child nodes. One child node memory with parameter 1 and another child func

All code inside a wasm module is grouped into functions. These function have the following structure:

( func <signature> <locals> <body> )

signature: declares the parameters and the returned values
locals: are like variables, with explicit types
body: linear list of low-level instructions

It’s kind of similar to functions in other languages

Signature and parameters

Each parameter has a explicit type. Wasm has currently 4 available number types:

i32: 32-bit integer
i64: 64-bit integer
f32: 32-bit float
f64: 64-bit float

A param is declared like (param i32) and the return type is (result f64). So a binary function with two parameters and a return type would be:


  (func (param i32) (param i32) (result f64) ...)

the result can be omited. It means the function dont have a return value

After the signature, the locals are listed with their type, like (local i32).

Parameters are basically locals initalized with the correspondent passed argument.

Getting and setting locals and parameters

Locals and parameters can be written/read by the body of the function with local.get and local.set instructions.

They refer to the numeric index of the item. Remember parameters are a list, ordered by declaration. And they are followed by the list of locals, also ordered by declaration.

So:


  (func (param i32) (param f32) (local f64)
    local.get 0
    local.get 1
    local.get 2)

Above, local.get 0 will get the “param i32”, local.get 1 would get “f32” and local.get 2 would get “local f64”

Using numeric indexes can be confusing and annoying, so the text format allow you to declare a name to parameters, locals and other items by incluing a prefixed $ before the type declaration:

(func (param $p1 i32) (param $p2 f32) (local $loc f64) ...)

Then to get these values simply:

local.get $p1 for example

Stack Machines

Before we can write the function body, we need to talk about stack machines.

Wasm execution is defined in stack machines, where basically every instruction pushes and/or pops a certain number of values (i32/i64/f32/f64) to/from a stack.

For example:

local.get pushes the value it is reading onto the stack.
i32.add pops two i32 values (the two previous values), computes their sum and pushes the resulting value onto the stack again.

When a function is called, the stack begin empty and is gradually filled up or emptied as the body instructions are executed.

The following function:


  (func (param $p i32)
    (result i32)
    local.get $p
    local.get $p
    i32.add)

Will contain exacly one i32 value in the stack, at the end of execution. It is the result of ($p + $p), which is the i32.add expression. Wasm has a validation rule to ensure the result type is exacly the final value in the stack. If there is no result type, the stack must be empty.

Our fisrt function body

The function body is just a series of instructions that are followed through execution. Let’s define a module containing our simple function:


  (module
    (func (param $lhs i32) (param $rhs i32) (result i32)
      local.get $lhs
      local.get $rhs
      i32.add))

This function gets two parameters, adds them together and returns the result.

Now we need to call this function!

Like locals and params, the function has a index by default, but we can also assign names for them.

And just like ES2015, we need to export the function declaration inside the module.


  (module
    (func $add (param $lhs i32) (param $rhs i32) (result i32)
      local.get $lhs
      local.get $rhs
      i32.add)
    (export "add" (func $add))
  )

The exported “add” is the name of the function we can call inside the javascript, whereas $add is the wasm function name.

Exploring fundamentals

Calling other functions from the same module in WASM

We use the call instruction for it, using the function index or name to be called.


  (
    module
      (func $getAnswer (result i32)
        i32.const 42)
      (func (export "getAnswerPlus1") (result i32)
        call $getAnswer
        i32.const 1
        i32.add))

Above we have two function in the same module. The function $getAnswer just return a i32 number, in this case 42. The other function is exported and also return a i32 result. But it also call the getAnswer function, declare a i32 const with value 1 and adds them together. And return the add result.

i32.const 42 just defines a 32-bit integer and pushes to the stack, with the value declared in front of const. We could change the type or the value, to match what we want.

Also in the example above, we use another syntax to export a function. func (export "getAnswerPlus1") is a shorthand for export function, but is the same as:

(export "getAnswerPlus1" (func $functionName))

Importing function from Javsacript

We know that Javascript can call Wasm function like:


  WebAssembly.instantiateStreaming(fetch("call.wasm")).then((obj) => {
    console.log(obj.instance.exports.getAnswerPlus1()); // "43"
  });

But what about Wasm calling javascript functions? Wasm don’t have actual knowledge of javascript, but it has a way to import functions.


  (module
    (import "console" "log" (func $log (param i32)))
    (func (export "logIt")
      i32.const 13
      call $log))

The import statement above is saying to import a function called log from the console module. In the exported function “logIt” we are calling the imported function, which we named “$log”.

Any javascript function can be passed! Once we define a import statement in wasm, the WebAssembly.instantiate() function should define an object as second parameter, which we can call importObject. This object has to have the corresponding properties that our import statement in wasm expects:


  var importObject = {
    console: {
      log: function (arg) {
        console.log(arg);
      },
    },
  };

  WebAssembly.instantiateStreaming(fetch("logger.wasm"), importObject).then(
    (obj) => {
      obj.instance.exports.logIt();
    }
  );

Declaring globals in WASM

We can create global variables acessible from both javascript and importable/exportable wasm modules. This is very useful for allowing dynamic linking between multiple modules.


  (module
    (global $g (import "js" "global") (mut i32))
    (func (export "getGlobal") (result i32)
          (global.get $g))
    (func (export "incGlobal")
          (global.set $g
              (i32.add (global.get $g) (i32.const 1))))
  )

Above, we are declaring a global $g which is defined as import, from “js” “global”. We also define the type as i32 and the mut keyword, meaning this global variable is mutable.

The “js” “global” we are importing from is declared in javascript using the WebAsssembly.Global() function:

const global = new WebAssembly.Global({value: "i32", mutable: true}, 0);

And in Javascript we can do:


  const global = new WebAssembly.Global({ value: "i32", mutable: true }, 0);

  var importObject = {
    js: {
      global,
    },
  };

  WebAssembly.instantiateStreaming(fetch("mywasm.wasm"), importObject).then(
    (obj) => {
      const result = obj.instance.exports.getGlobal();
      console.log(result);

      obj.instance.exports.incGlobal();
      console.log(global.value);
    }
  );

WebAssembly Memory

What if we want to work with strings or other data types in wasm? We can use Memories for that! (or the new Reference types). But talking about memory, memory is just a large array of bytes that can grow over time.

Wasm has instructions like i32.load or i32.store for reading and writing from linear memory.

If we think about Javascript POV, memory is a big ArrayBuffer. So a string in just a sequence of bytes somewhere inside this linear memory.

Let’s assume we wrote a string into memory. How we pass that to Javascript? Javascrip can access that via WebAssembly.Memory() which can access existing memory (we can only have one memory per module) and use the methods associated. There is, for example, a buffer getter that returns an ArrayBuffer of the whole memory.

We can also use Memory.grow() to grow the memory. But since ArrayBuffer cannot change size, it creates a new ArrayBuffer pointing to the newer and bigegr memory.

To pass a string to javascript, we need to declare the offset (position) of the string in memory, as well as the length of it (to know where the string ends in the linear memory)

There are ways to encode a string length into the string itself (eg. C strings) but let’s pass the offset and length as parameters:


  (import "console" "log" (func $log (param i32) (param i32)))

In Javascript, we can now use TextDecorator to decode our bytes into a javascript string


  function consoleLogString(offset, length) {
    var bytes = new Uint8Array(memory.buffer, offset, length);
    var string = new TextDecoder("utf8").decode(bytes);
    console.log(string);
  }

The last thing to do is make our consoleLogString function get access to the wasm memory. We could do it two ways:

creating a Memory object in Javascript and have the Wasm module import the memory
inside the wasm module, create the memory and export it to javascript

Let’s create in javascript and import it into webassembly:


  // in JS
  var memory = new WebAssembly.Memory({initial:1});

  // in WASM
  (import "js" "mem" (memory 1))

The memory 1 indicates the imported memory have at least 1 page of memory (64kb)

Our final Wasm looks like:


  (module
    (import "console" "log" (func $log (param i32 i32)))
    (import "js" "mem" (memory 1))
    (data (i32.const 0) "Hi")
    (func (export "writeHi")
      i32.const 0  ;; pass offset 0 to log
      i32.const 2  ;; pass length 2 to log
      call $log))

Since we are writing our own assembly, we are writing the string content into global memory using the data. It allows a string of bytes to be written at a given offset at instantiation time.

data can be used to initialize regions of linear memory with bytes.

If we were compiling a C program, we would just call a function to allocate some memory for the string.

The ;; are comments in wasm

Now our javascript is:


  var memory = new WebAssembly.Memory({ initial: 1 });

  var importObject = { console: { log: consoleLogString }, js: { mem: memory } };

  WebAssembly.instantiateStreaming(fetch("logger2.wasm"), importObject).then(
    (obj) => {
      obj.instance.exports.writeHi();
    }
  );

This results in “Hi” being written in the console

WebAssembly Tables

Tables are basically resizeble arrays of references that can be accessed by index from WebAssembly code.

Remeber the call instruction? We can only call one single static function with it, but what about calling a runtime value?

In Javascript functions are first-class values, in C we can use pointers to reference a function.

In Wasm we store funcion references in a table, and pass around the table indices, which are just i32 values. So now, we can use the call_indirect instruction (which calls dynamic functions), passing simply a i32 index value.

Defining a table

elem can be used to initialize regions of tables with functions


  (module
    (table 2 funcref)
    (elem (i32.const 0) $f1 $f2)
    (func $f1 (result i32)
      i32.const 42)
    (func $f2 (result i32)
      i32.const 13)
    // ...
  )

Explaining above:

table 2 funcref: 2 is the initial size of table, meaning it will store two references. funcref is the type, function reference.
elem can list any subset of functions in a module, allowing duplicates. These functions will be referenced by the table, in the order declared.
i32.const 0 inside the elem is an offset. This specifies at what index the table function references start to be populated. Since we start at index 0 and the table has a length of 2, our function references will be at index 0 and 1.

In Javascript, to create a similar table we would do:


  function() {
    // table section
    var tbl = new WebAssembly.Table({initial:2, element:"funcref"});

    // function sections:
    var f1 = ... /* some imported WebAssembly function */
    var f2 = ... /* some imported WebAssembly function */

    // elem section
    tbl.set(0, f1);
    tbl.set(1, f2);
  };

Using the table


  (type $return_i32 (func (result i32)))
  (func (export "callByIndex") (param $i i32) (result i32)
    local.get $i
    call_indirect (type $return_i32))

The first line above type $return_i32 (func (result i32)) specifies a type, with a reference name of return_i32. This table is used to perform type checking when checking the table function reference call. Here, we specify that the reference need to be a function that return i32 as result
The we define the function, which will receive a parameter named $i. Then we add this parameter value to the stack with loca.get $i and then we call a function from the table with call_indirect. The $i is the index of the function we are calling from the table, we are calling the $i’th function in the table. (in javascript would be like tbl[i]())

The call_indirect implicity pops the $i value from the stack, but we also could define explicity like this:

(call_indirect (type $return_i32) (local.get $i))

Above, we also typechecked the called function. It has to return i32, otherwise will throw an WebAssembly.RuntimeError.

But wait, what connects our call_indirect with our table? Right now Wasm only supports one table per module, so call_indirect is implicity calling it.

The full module would be:


  (module
    (table 2 funcref)
    (func $f1 (result i32)
      i32.const 42)
    (func $f2 (result i32)
      i32.const 13)
    (elem (i32.const 0) $f1 $f2)
    (type $return_i32 (func (result i32)))
    (func (export "callByIndex") (param $i i32) (result i32)
      local.get $i
      call_indirect (type $return_i32))
  )

and the javascript to call the function is:


  WebAssembly.instantiateStreaming(fetch("wasm-table.wasm")).then((obj) => {
    console.log(obj.instance.exports.callByIndex(0)); // returns 42
    console.log(obj.instance.exports.callByIndex(1)); // returns 13
    console.log(obj.instance.exports.callByIndex(2)); // returns an error, because there is no index position 2 in the table
  });

Mutating tables and dynamic linking

Javascript has full access to function references, so the Table Object can be mutated from javascript with grow(), get() and set(). We could also manipulate the table in Wasm using table.set and table.get.

We can use dynamic linking schemes, where multiple instances share the same memory and table. This leads to great performance.

shared0.wat


  (module
    (import "js" "memory" (memory 1))
    (import "js" "table" (table 1 funcref))
    (elem (i32.const 0) $shared0func)
    (func $shared0func (result i32)
    i32.const 0
    i32.load)
  )

We have a single import object called “js”, containing a Memory and Table objects. We will pass this same import object to multiple instantiate calls.

shared1.wat


  (module
    (import "js" "memory" (memory 1))
    (import "js" "table" (table 1 funcref))
    (type $void_to_i32 (func (result i32)))
    (func (export "doIt") (result i32)
    i32.const 0
    i32.const 42
    i32.store  ;; store 42 at address 0
    i32.const 0
    call_indirect (type $void_to_i32))
  )

How it works:

The function shared0func is defined and stored in our imported table. shared0func creates a constant with value 0 and uses i32.load to load the value contained in the provided memory index, which is implicity 0, because is the last value in the stack. It loads and return the value stored at memory index 0.
In shared1, we export a function called doIt. This function creates two constants and calls i32.store to store a provided value at a provided index of the imported memory. It pops the values from the stack, so stores the value 42 at memory index 0. In the last part, we create a constant with value 0 and call a function at this index 0 of the table, which is shared0func store there by the elem block in shared0.wat.
shared0func loads the 42 we stored in memory using i32.store command, in shared1.wat

Again, we are using implicit expression to pop values from the stack. We could do explicity like:


  (i32.store (i32.const 0) (i32.const 42)) ;; puts value 42 into index 0
  (call_indirect (type $void_to_i32) (i32.const 0)) ;; call function at index 0 of table

In the javascript, we use both wasm files:


  var importObj = {
    js: {
      memory: new WebAssembly.Memory({ initial: 1 }),
      table: new WebAssembly.Table({ initial: 1, element: "funcref" }),
    },
  };

  Promise.all([
    WebAssembly.instantiateStreaming(fetch("shared0.wasm"), importObj),
    WebAssembly.instantiateStreaming(fetch("shared1.wasm"), importObj),
  ]).then(function (results) {
    console.log(results[1].instance.exports.doIt()); // prints 42
  });

Each module being compiled can share the same linear memory and table address.