visit
One of the trickiest tasks in code generation is what to call the code generation function. For this article, I have opted to call the main function compile
. The compile function will take our root AST (a BlockNode
) and return a new binaryen.Module
. The binaryen.Module
exposes functions that can optimize and emit wasm in various formats.
// src/compiler.mts
import binaryen from "binaryen";
import { AstNode, BlockNode, IdentifierNode } from "./types/index.mjs";
export const compile = (block: BlockNode): binaryen.Module => {
// Creates a new binaryen module that our helper functions will fill in
const mod = new binaryen.Module();
// The function map is used to track all the functions and their types. More on this later
const functionMap = generateFunctionMap(block);
// This function registers all the standard library functions we'll include with our language.
// This includes functions like add, subtract, etc.
registerStandardFunctions(mod, functionMap);
// This is where the magic happens. Because `BlockNode` is an expression, this
// function can recursively compile every instruction in a wispy program file
compileExpression({
expression: block,
mod,
functionMap,
parameters: new Map(),
});
// Finally, we return the binaryen module
return mod;
};
Next, we define the generateFunctionMap
function. This function crawls the entire expression tree to find and register function definitions. It is important we do this before actually compiling the functions as some functions may call other functions before they've been defined.
The return type of generateFunctionMap
is a map where the key is the function name and the value is an object containing all the important information about the function the compiler needs to know about. For now, all we need is returnType
.
// src/compiler.mts
type FunctionMap = Map<string, { returnType: number }>;
// src/compiler.mts
const generateFunctionMap = (block: BlockNode): FunctionMap => {
// Preview the first node (i.e. expression) of the block
const firstNode = block.expressions[0];
// If the first node is an identifier and the identifier is "fn", then we know this block represents a function definition.
if (isNodeType(firstNode, "identifier") && firstNode.identifier === "fn") {
// Grab the function identifier / name, and it's return type. This is the second expression of
// the function definition, a typed identifier.
const { identifier, returnType } = getFunctionIdentifier(block); // We'll define this next
// Return the function map
return new Map([
[identifier, { returnType }],
// It's possible this function may contain function definitions inside of it. So we
// We put all the remaining expressions of the function into a new block and scan it
// then we merge the resulting map with this one.
...generateFunctionMap({ type: "block", expressions: block.expressions.slice(3) }),
]);
}
// A block can contain multiple expressions. So, we must scan each one to see if it is a function
// definition. The root `BlockNode` for instance, will almost always have multiple functions.
return block.expressions.reduce((map, expression) => {
// Only block expressions can be functions
if (expression.type === "block") {
return new Map([...map, ...generateFunctionMap(expression)]);
}
// We can ignore all other expression
return map;
}, new Map());
};
Onto getFunctionIdentifier
we called earlier. This function is simple. It takes a BlockNode
, ensures that the second identifier is a TypedIdentifierNode
, and then returns the identifier and return type:
// src/compiler.mts
const getFunctionIdentifier = (block: BlockNode) => {
// Grab the second expression
const node = block.expressions[1];
// Ensure the expression is a typed identifier
if (!isNodeType(node, "typed-identifier")) {
throw new Error("Expected typed function name");
}
return {
identifier: node.identifier,
// We have to map the return type to a type binaryen understands.
returnType: mapBinaryenType(node.typeIdentifier),
};
};
As noted in the getFunctionIdentifier
function, Binaryen doesn't understand what type the string typeIdentifier
is. To handle this we have to map our defined types to binaryen types. For now, we'll just support i32
and f32
. Thankfully binaryen uses the same nomenclature we do. So the map function is pretty simple:
// src/compiler.mts
const mapBinaryenType = (typeIdentifier: string): binaryen.Type => {
if (typeIdentifier === "i32") return binaryen.i32;
if (typeIdentifier === "f32") return binaryen.f32;
throw new Error(`Unsupported type ${typeIdentifier}`);
};
getFunctionIdentifier
Made a call to a new function, isNodeType
. This function is essentially the same concept as the isTokenType
function we ASTNode
instead of Token
.
// src/compiler.mts
export const isNodeType = <T extends AstNode["type"]>(
item: unknown,
type: T
): item is Extract<AstNode, { type: T }> => {
return (
// Ensure the type exists
!!item &&
// Ensure the type is an object
typeof item === "object" &&
// Cast the type as a record, so TypeScript doesn't get mad at us and then compare the
// type field with the type parameter. If they are equal, we know the node is the
// the type we were looking for.
(item as Record<string, unknown>)["type"] === type;
)
};
The compileExpression
function is where we really start to make use of binaryen
to model the generated machine code. Because of the tree structure of an, ahem, abstract syntax tree, compileExpression
is highly recursive. This is one of my favorite things about programming languages, their patterns tend to lend themselves to elegant recursive functions with high levels of code re-use.
Let's start with defining the parameters of compileExpression
. We will need to pass the binaryen.Module
and the functionMap
we created earlier, the actual expression we are compiling, and any parameters of the function this expression may be a part of (if it's inside) of a function. When there are more than two parameters of a function it can be difficult to visually keep track of what is what. So I like to make it clear by grouping them together in an object. This enforces labeling the parameters on call and as a result, improves code readability.
Now that we have the options for compileExpression
defined, we can define the actual function. compileExpression
takes CompileExpressionOpts
as a parameter and returns a number
. The job of this function is to take an expression and determine what type of expression it is; from there, it can pass the expression to another compiler function that can handle that specific type of expression.
Why return a number? When we build an expression with
binaryen
it returns a number as an identifier for that expression. This allows us to compile an expression ahead of time and then reference that expression later down the line.
// src/compiler.mts
const compileExpression = (opts: CompileExpressionOpts): number => {
// Grab the expression and the binaryen module (mod) from the options.
// The other fields are used by child function calls
const { expression, mod } = opts;
// Map the expression node to it's corresponding specific compiler
if (isNodeType(expression, "block")) return compileBlock({ ...opts, expression });
// Numbers are simple enough to compiler that we can just inline the compiler here.
// They are represented as constants
if (isNodeType(expression, "int")) return mod.i32.const(expression.value);
if (isNodeType(expression, "float")) return mod.f32.const(expression.value);
if (isNodeType(expression, "identifier")) return compileIdentifier({ ...opts, expression });
// Throw a helpful error message if we don't recognize the expression
throw new Error(`Unrecognized expression ${expression.type}`);
};
Let's define the compileBlock
functions. Because this function is also compiling an expression, we can re-use the previously defined CompileExpressionOpts
, but we'll narrow the expression field to the BlockNode
type, since we know we are compiling a block by the time this function is called:
interface CompileBlockOpts extends CompileExpressionOpts {
expression: BlockNode;
}
const compileBlock = (opts: CompileBlockOpts): number => {
// We re-map the expression field to block here for clarity.
const { expression: block, mod } = opts;
// When a block has multiple expressions and the first one is an identifier, that means
// the block is actually a function call.
if (isNodeType(block.expressions[0], "identifier") && block.expressions.length > 1) {
// If it is a function call, transfer responsibility to the `compileFunctionCall` function (defined next)
return compileFunctionCall(opts);
}
// This is where the recursive beauty starts to show. Since every value of a block
// is an expression, we can map each one back to the compileExpression function.
const expressions = block.expressions.map((expression) => {
return compileExpression({ ...opts, expression });
});
// Now we generate the machine code by calling the block function of binaryen
// This function takes a block name, an array of compiled expressions, and a block return type.
// Named blocks are mostly useful for looping constructs like `for` and `while`. In this
// case we can pass null as we're not compiling a loop construct. Additionally, we can
// pass `auto` as the type since binaryen is smart enough to determine the return type
// of blocks automatically.
return mod.block(null, expressions, binaryen.auto);
};
Note: If you're curious to see how looping works in binaryen / WebAssembly works, check out my blog
. Spoiler alert, its pretty weird.
The last simple expression we'll compile in this section is the identifier expression. If compileExpression
was passed a lone IdentifierNode
it means that the expression evaluates to the value of the identifier. In wispy, we don't have variables and function identifiers are caught before they could've been passed here. That means the only thing IdentifierNode
can resolve to is a parameter.
interface CompileIdentifierOpts extends CompileExpressionOpts {
expression: IdentifierNode;
}
const compileIdentifier = (opts: CompileIdentifierOpts): number => {
// We remap expression to node to keep our lines a little shorter
const { expression: node, parameters, mod } = opts;
// Since we know the identifier has to be a parameter, we look it up in our
// parameter map. Don't worry, we'll define the parameter map in the next section
const info = parameters.get(node.identifier);
if (!info) {
throw new Error(`Unrecognized identifier ${node.identifier}`);
}
// Finally, we use the local.get instruction to return the parameter value.
// Binaryen needs to know the parameters index and type. We'll get into
// the index when we define our parameter mapping function.
return mod.local.get(info.index, info.type);
};
In wispy function calls are blocks with multiple expressions where the first expression is an identifier. The job of compileFunction
is to determine which function is being called, what its parameters and return type are, and finally, building the call instruction with binaryen.
// src/compiler.mts
// Because function calls are blocks, we can re-use CompileBlockOpts
const compileFunctionCall = (opts: CompileBlockOpts): number => {
const { expression, functionMap, mod } = opts;
// The first expression of a function call is the functions identifier
const identifierNode = expression.expressions[0];
// Here we just ensure the identifierNode is *actually* an identifier. Otherwise we throw an error.
if (!isNodeType(identifierNode, "identifier")) {
throw new Error("Expected identifier when compiling function call");
}
// Next we create a reference to what the actual identifier is
const identifier = identifierNode.identifier;
// If the identifier is "fn", the function we are calling is the function to define functions!
// That's right! Functions are created by another function. Pretty neat if you ask me.
if (identifier === "fn") return compileFunction(opts); // We'll define this next
// Ifs are special functions. They may or may not have an else block. Binaryen needs to know
// if the else block exists at compile time, so we have a special if compiler for this.
if (identifier === "if") return compileIf(opts); // We'll define this later
// Every other function is either part of the standard library, or is defined
// within the wispy code itself.
const functionInfo = functionMap.get(identifier);
if (!functionInfo) {
throw new Error(`Function ${identifier} not found`);
}
const params = expression.expressions
// Every other expression in the block are parameters to the function, so we compile them
// and then pass them to the call
.slice(1)
.map((expression) => compileExpression({ ...opts, expression }));
// Now we use binaryen to construct the call expression. The first parameter
// is the functions identifier, the second are the compiled parameter expression,
// and the third is the return type which has already been determined by generateFunctionMap
return mod.call(identifier, params, functionInfo.returnType);
};
Let's define the compileIf
function before we move onto the compileFunction
... function.
// src/compiler.mts
const compileIf = (opts: CompileBlockOpts): number => {
const { expression, mod } = opts;
// The first expression, expression.expressions[0], is the "if" identifier, we don't need
// to do anything with it since we already know we are compiling an if expression
// The second expression is the if condition
const conditionNode = expression.expressions[1];
// The third expression is the ifTrueNode, it's what is executed if the conditionNode evaluates to
// true
const ifTrueNode = expression.expressions[2];
// Finally the fourth expression (which may or may not exist) is what is executed if the condition
// evaluates to false
const ifFalseNode = expression.expressions[3];
// Compile the condition expression
const condition = compileExpression({ ...opts, expression: conditionNode });
// Compile the ifTrue Expression
const ifTrue = compileExpression({ ...opts, expression: ifTrueNode });
// Check to see if the ifFalseNode exists, if it does, compile it, otherwise set ifFalse to undefined
const ifFalse = ifFalseNode ? compileExpression({ ...opts, expression: ifFalseNode }) : undefined;
// Finally we use binaryen to compile the if expression
return mod.if(condition, ifTrue, ifFalse);
};
Function definitions are a whole lot like function calls, so the function structure is pretty similar. We take CompileBlockOpts
and return a number (the binaryen expression reference).
// src/compiler.mts
const compileFunction = (opts: CompileBlockOpts): number => {
const { expression: block, mod } = opts;
// We need to tell binaryen what the identifier and return type of the function is
// Thankfully, we already wrote a function for that, getFunctionIdentifier. We
// could also have just looked up this information with the functionMap, but
// this is more fun.
const { identifier, returnType } = getFunctionIdentifier(block);
// Next we grab the function parameters. This is the third expression of the function
const { parameters, parameterTypes } = getFunctionParameters(block); // Defined later
// The rest of the expressions in the function are the functions block. So we create
// a new BlockNode from the remaining expression.
const body = compileBlock({
...opts,
expression: {
type: "block",
expressions: block.expressions.slice(3),
},
// We need to pass the parameters of this function, so they can be referenced in child
// expressions
parameters,
});
// Now we register the function with binaryen. Binaryen takes the function identifier,
// an array of parameter types (each item being the type of parameter in order),
// the function's return type, a list of variable types (wispy doesn't have any, so we pass an empty array)
// and finally the compiled body of the function.
mod.addFunction(identifier, parameterTypes, returnType, [], body);
// To make things easy we export every single function defined in a wispy file
// so it can be called by the WebAssembly host.
mod.addFunctionExport(identifier, identifier);
// Because function definitions are *technically* expressions that can be a part of another function
// body, we need to return an expression pointer. For this, we just return a nop (do nothing instruction),
// to make things consistent.
return mod.nop();
};
Now, let's define the getFunctionParameters
function. This function takes the function BlockNode
, that is, the entire unmodified function definition, and extracts its parameters. The function returns two values, parameters and parameterTypes.
The first returned value, parameters
, is a map where the key is the parameter identifier, and the value is the information needed to access the parameter down the line within the function body.
The second returned value is an array of binaryen types. There is one type for each defined parameter, and they must remain in the order they are defined. This is because binaryen doesn't reference parameters by their names, instead it references them by the index of the array in which they are defined. Don't worry if this is confusing to you, the code should make things a little more clear. If you need, refer to the compileIdentifier
definition, to get a better understanding of how this works in practice.
// src/compiler.mts
type ParameterMap = Map<string, { index: number; type: number }>;
const getFunctionParameters = (block: BlockNode) => {
// The parameters are defined in the third expression of the function definition
const node = block.expressions[2];
// Check to make sure the third expression is a block
if (!isNodeType(node, "block")) {
throw new Error("Expected function parameters");
}
// Now we reduce the parameters into a parameter map, and a list of binaryen types
const { parameters, types } = node.expressions.reduce(
(prev, node, index) => {
// First, ensure that the node is a typed-identifier. Every parameter must be a
// typed identifier, therefore, every node in this reducer must be a typed identifier.
if (!isNodeType(node, "typed-identifier")) {
throw new Error("All parameters must be typed");
}
// Determine the correct binaryen type of the parameter
const type = mapBinaryenType(node.typeIdentifier);
// Add the parameter's type to the list of types we've defined so far
const types = [type, ...prev.types];
// Now add the parameter to the parameter map. We save the parameters index and type.
// The index and type is used binaryen to access the parameter when it is used
// later in the function body
const parameters = new Map([[node.identifier, { index, type }], ...prev.parameters]);
// Return updated parameters map and types array
return {
parameters,
types,
};
},
// Here we are setting the starting values for our reducer function and casting the default
// type so typescript can correctly infer the `prev` parameter type
{ parameters: new Map(), types: [] } as {
parameters: ParameterMap;
types: number[];
}
);
// Finally we return the parameter map and the parameterTypes
return {
parameters,
// Note: parameterTypes is a number, instead of an array of numbers as you'd expect.
// So we have to use binaryen.createType to create a new type that is referenced
// the mod.addFunction function. This is one inconsistency with the binaryen API. Parameters
// are defined as a number, and variables are defined as an array of numbers. I'm sure there
// is a reason for this, but I don't know what that reason is.
parameterTypes: binaryen.createType(types),
};
};
// src/compiler.mts
const registerStandardFunctions = (mod: binaryen.Module, map: FunctionMap) => {
const { i32, f32 } = binaryen;
const { i32: i32m, f32: f32m } = mod;
const common = { mod, map };
registerLogicFunction({ name: "lt_i32", type: i32, operator: i32m.lt_s, ...common });
registerLogicFunction({ name: "gt_i32", type: i32, operator: i32m.gt_s, ...common });
registerLogicFunction({ name: "eq_i32", type: i32, operator: i32m.eq, ...common });
registerLogicFunction({ name: "lt_f32", type: f32, operator: f32m.lt, ...common });
registerLogicFunction({ name: "gt_f32", type: f32, operator: f32m.gt, ...common });
registerLogicFunction({ name: "eq_f32", type: f32, operator: f32m.eq, ...common });
registerMathFunction({ name: "add_i32", type: i32, operator: i32m.add, ...common });
registerMathFunction({ name: "sub_i32", type: i32, operator: i32m.sub, ...common });
registerMathFunction({ name: "mul_i32", type: i32, operator: i32m.mul, ...common });
registerMathFunction({ name: "add_f32", type: f32, operator: f32m.add, ...common });
registerMathFunction({ name: "sub_f32", type: f32, operator: f32m.sub, ...common });
registerMathFunction({ name: "mul_f32", type: f32, operator: f32m.mul, ...common });
registerMathFunction({ name: "div_f32", type: f32, operator: f32m.div, ...common });
};
const registerMathFunction = (opts: {
mod: binaryen.Module;
name: string;
type: number;
operator: (left: number, right: number) => number;
map: FunctionMap;
}) => {
const { mod, name, type, operator, map } = opts;
return registerBinaryFunction({
mod,
name,
paramType: type,
returnType: type,
operator,
map,
});
};
const registerLogicFunction = (opts: {
mod: binaryen.Module;
name: string;
type: number;
operator: (left: number, right: number) => number;
map: FunctionMap;
}) => {
const { mod, name, type, operator, map } = opts;
return registerBinaryFunction({
mod,
name,
paramType: type,
returnType: binaryen.i32,
operator,
map,
});
};
const registerBinaryFunction = (opts: {
mod: binaryen.Module;
name: string;
paramType: number;
returnType: number;
operator: (left: number, right: number) => number;
map: FunctionMap;
}) => {
const { mod, name, paramType, returnType, operator, map } = opts;
mod.addFunction(
name,
binaryen.createType([paramType, paramType]),
returnType,
[],
mod.block(
null,
[operator(mod.local.get(0, paramType), mod.local.get(1, paramType))],
binaryen.auto
)
);
map.set(name, { returnType });
};
First, replace the contents of src/index.mts
with this:
// src/index.mts
#!/usr/bin/env node
import { readFileSync } from "fs";
import { compile } from "./compiler.mjs";
import { lex } from "./lexer.mjs";
import { parse } from "./parser.mjs";
const file = process.argv[2];
const input = readFileSync(file, "utf8");
const tokens = lex(input);
const ast = parse(tokens);
// !! New !!
const mod = compile(ast);
// This is sneakily where the code gen is *actually* happening
const binary = mod.emitBinary();
// Use the standard WebAssembly API to convert the wasm binary to a compiled module
// our host NodeJS/v8 can use
const compiled = new WebAssembly.Module(binary);
// Build the instance, here you would add any external functions you might want to import into
// the WebAssembly module
const instance = new WebAssembly.Instance(compiled, {});
// Finally, run the main function and log the result. We have to cast instance.exports to any
// The standard TypeScript types appear to be wrong.
console.log((instance.exports as any).main());
npx tsc
wispy example.wispy
If all goes well (and you passed the number 15 to fib), you should see the number 610
in the output of your console. If so, you've done it, you've made a working WebAssembly language.
A full copy of the wispy language implementation is available under an open-source license at