Understanding ESModules on the browser

A dive on esmodules and why you should care

Programming has always leverage the idea of reusing blocks of code to create a program. Almost like building legos, we can create a system from separate blocks of code and that is considered a good practice among programmers all over the world. It favours reusability, maintenance and debugging for example.

Before, one common alternative in browsers was to put everything on the global scope. This made tricky to mantain code, since the scripts must be loaded in certain order and every other code depends on code inside the global scope. Every dependency was implicit and not secure at all, since malicious code could temper the global scope too.

We all know browsers evolve a lot in the last years, and web applications are becoming more complex each year. Until a few years ago, we didn’t have a native way to share blocks of code in the browser. We had to use tools like Webpack in order to leverage the ability to import code on demand, when needed. But now, with browser support to ESModules, that may change the way web developers create and build applications.

A module has it’s own scope, organizing functions and variable that should be together. But the best part is that modules can share code with one another, explicitly. So they create a explicity dependency of other modules, making it easy to understand and manage code.

From Mozilla website:

Browsers can optimize loading of modules, making it more efficient than having to use a library and do all of that extra client-side processing and extra round trips

When creating a module file for the browser, you can use the .js or the .mjs file extension. According to the V8 documentation, using the .mjs extension is good to make it clear the file is a Javascript module for the browser, and also it can be interpreted and parsed as modules by runtimes such as NodeJS and Babel. But at the last instance, when it comes to browsers it really doesn’t matter if you use .js or .mjs, as long as the server serving your files are setting the header type of Content-Type: text/javascript for them. If this is not the case, when loading the modules on the browser you’ll get an error of wrong MIME type for modules.

At the end, the browser know it’s a module file because of the type="module" attribute.

To export a module, you use the following syntax:

export const name = "Ricardo";

export function logMessage(message) {
  console.log("The message is " + message);
}

To import it, you use the following syntax:

import { name, logMessage } from "./module-path.js";

Notice the relative path when importing a module. This is preferred because it will work even if you change the level structure of your files, as long as the imported and exported files share the same path relation. This path is called “module specifier” and there are other ways to import a module:

import { logMessage } from "./module-path.mjs";
import { logMessage } from "../module-path.mjs";
import { logMessage } from "/modules/module-path.mjs"; // root path of project
import { logMessage } from "https://simple.example/modules/module-path.mjs";

But the following are not supported yet:

import { shout } from "jquery";
import { shout } from "lib.mjs";
import { shout } from "modules/module-path.mjs";

And now to load your module on the HTML file you can:

<script type="module" src="main.js"></script>

// or

<script type="module">
  /* JavaScript module code here */
</script>

You can also rename modules on export:

export { function1 as newFunctionOne };

And on import:

import { function1 as newFunctionOne } from "./modules/module.js";

Or even import everything from a module as a module object:

import * as Canvas from "./modules/canvas.js";

Canvas.function1();
Canvas.function2();

Another nice feature is the dynamic module loading, where we pass a module path and can dynamically import it when nedeed. It will resolve as a Promise, which fullfills as a module object:

import("./modules/circle.js").then((Module) => {
  let circle = new Module.Circle();
});

Other cool thing: we can await some action of the module, and when importing this module it will wait unitl is resolved, without blocking other modules:

const colors = fetch("../data/colors.json").then((response) => response.json());

export default await colors;

And then when importing:

import colors from "./modules/getColors.js";
import { Canvas } from "./modules/canvas.js"; // won't be blocked by colors

All these import statements build up a graph of dependencies. There’s always an entry point, from where it begins importing other modules and so on so forth. The browser cannot understand the files, so it need to be parsed to a data structure called Module Records. A Module Record contain several information about the file such as import entries, AST code, requested modules and more.

Then the Module Record is turned into a Module Instance, which combines code + state. Code is like a set of instructions and state are the actual values of the variables (which are just boxes in memory that holds those values).

There is a Module Instance for each module and to load the whole application you’ll need a graph of Module Instances. There are 3 steps for it:

Construction: find, download and parse all files into module Records
Instantiation: find boxes in memory to place all exported values, then make exports and imports point to those places in memory (this is called linking)
Evaluation: run the code to fill the boxes in memory with values

Since these 3 phases can be done separately, ES Modules has the ability to be asynchronous. In CommonJS for example, all these phases are done in one shot, without breaks.

There is a specification for how the module’s files should be parsed into Module Records and how they should be instantiated and evaluated. But it’s the loader’s job to fetch these files in the first place, and there is no specification for that. For browsers, the HTML spec is responsible for that. Is the loader’s job to control how the modules are loaded into the environment too.

The Construction step:

During this phase, it needs to download the file containing the module (from a URL or loading from the file system) and parsed the file into a module record. The loader takes care of finding the file and downloading it. It needs to find the entry point, which probably will be on the HTML file. And from this file, it will use the import statements (more specifically the module specifier) to find all the other modules.

In the construction phase, the loader goes through each file, download it, parse it, find the other modules using the import statements and repeat the process until it gets the whole graph. Since the browser needs to download the files from the internet and the download part takes a long time, doing this in the main thread and blocking the main thread would be very slow. To deal with that, ESModules split the phases and that’s why construction phase it’s on it’s own. By being separate, in the construction phase it’s possible to build a whole vision of the module graph before getting to the synchronous part of instantiating.

In CommonJS the phases are different because it’s a lot faster to load files from the filesystem. So it can block the main thread to load, parse, instantiate and evaluate the module all in one. So in CommonJS, before returning the module, everything is already instantiated and evaluated.

That’s why we can use variables in CommonJS (are already evaluated) but cannot use in ESModules:

require(`${path}/counter.js`).count; // it works!

import { count } from `${path}/counter.js` // doesn't work =/

However, we can have dynamic imports nowadays, made possible to use variables in ESModule by making each imported module a separate graph, which is processed separately.

Important note, if a module is shared across multiple graphs, meaning they are imported in multiple files, they will use the same Module Instance. This happens because the loader caches the Module Instances, making things easier for the engine. The cache is managed by a Module Map, which is basically a key value pair the loader handles after it fetches a file, it’s like taking notes that it has fetched some file, storing the module URL. So basically keeps track of files being fetched and serves as cache too.

The Parsing step:

After fetching the file, time to transform it into a Module Record. Once it’s created, it is placed into the Module Map (for cache, remember?).

In this parsing phase, the unique characteristics of a module such as always be treated as “use strict”, have a reserved await keyword, this being undefined, etc are very important because it will define the result of the parsing. If these parse goals_ are different, the results may be different too. That’s why it’s important to indicate to the browser a file is a module with type="module" or using .mjs files.

The Instantiation step:

Part dedicated to code + state. State lives in the memory, so this step it’s about wiring things up to memory. In this phase, the JS engine creates boxes in memory to each export, but without values yet (values will be linked in the evaluation step). The only expection are exported functions, which are initialized right away.

The engine will start instantiating (assining boxes in memory to each export) the last module in the graph, the leaf that doesn’t depends on any other module.

After wiring up the exports of a module, the engine goes one layer up and start wiring the imports of the parent module. The exports and imports should point to the same location in memory, since one module is importing the exports of other.

For ESModules, this pointing behavior means that if a export change the value, the import will get the “updated” value too. Meanwhile in CommonJS there is no pointing behavior but instead it makes a copy of the exported values. So the values won’t be updated in case of a change.

Good to note that imports cannot change it’s values, only exports can change it.

Until now, we didn’t run any code yet! Just download the files, parsed into Module Records and assign the imports and exports into boxes in memory.

The Evaluation step:

Now it’s time to evaluate the code and assign values to those memory locations, fill the boxes with values. The engine also begins with the leaf modules, the last one. The way it evaluates is by always executing the top level code (code outside function). We can only evaluate the code one time because of the side effects that may return different results everytime they run. That’s why the Module Record is cached in the Module Map, so it’s executed only once.

In this phase, with the pointing behavior of ESModules, we can avoid the cyclic dependency issue, because eventually one module will evaluate and update the other module who is importing it.

The Mozilla docs also expose some nice characteristics of javascript modules, such as:

Loading a module locally in a HTML file from the filesystem file:// will get a CORS error due to Javascript module security requirements
Modules are deferred by default in the browser. That means the module (and it’s dependencies) are downloaded in parallel with the HTML, but executed only after the HTML parser
Modules are executed only once, no matter how many times you reference them
Code inside a module is not available in the global scope, only in the scope they were imported on

One very interesting blog post on Mozilla in the early days of the ESModule tell us what happens when we import a module. It has basically four steps:

Parsing: reads the source code of the module and checks for syntax
Loading: loads all imported modules recursively
Linking: create a module scope and fills with the bindings declared in that module. If you import a function that does not exist in another module, it will give an error
Runtime: runs the statements in the body of the module. By this time, everything is already loaded, so it just have to run

As these steps were not standarized at the time, each implementation can do loading the way they want. Webpack for example, figure out all the dependencies a module depends on ahead of time (by looking at the import statements recursively) and at compile time it bundles all the modules into a single file to be shipped over the network. This is very nice, because if we had to deal with the import at runtime, it would take a LOT of roundtrips to load the imported modules, can you imagine!? Also, it’s a recommendation to keep bundling your code before shipping to production because since the import and export are statically analyzable, they can be optimized by the toolling you are using, removing unused module dependencies for example.

But how modules are different from regular scripts?

Modules are strict mode by default
Modules have a lexical top-level scope, meaning a variable declaration won’t make the variable available in the global scope of window
The this keyword don’t refer to global this. Instead, you would have to use globalThis for it
import and export only work on modules

To tell modules and scripts apart on the browser, we can use the type=module on the script:

<script type="module" src="module.js"></script>