Abstract
This paper proposes $\lambda_{mmm}$, a call-by-value, simply typed lambda calculus-based intermediate representation for a music programming language that handles synchronous signal processing and introduces a virtual machine and instruction set to execute $\lambda_{mmm}$. Digital signal processing is represented by a syntax that incorporates the internal states of delay and feedback into the lambda calculus. $\lambda_{mmm}$ extends the lambda calculus, allowing users to construct generative signal processing graphs and execute them with consistent semantics. However, a challenge arises when handling higher-order functions because users must determine whether execution occurs within the global environment or during DSP execution. This issue can potentially be resolved through multi-stage computation.
* This is the HTML version of the pre-print. You can download PDF version from Zenodo Repository(https://doi.org/10.5281/zenodo.13855343). If you want to cite this article, please refer to the published version.
Introduction
Many programming languages have been developed for sound and music; however, only a few possess strongly formalized semantics. A language that is both rigorously formalized and practical is Faust [1]; it combines blocks with inputs and outputs with five primitive operations: parallel, sequential, split, merge, and recursive connection. Almost any type of signal processing can be written in Faust by providing basic arithmetic, conditionals, and delays as primitive blocks. In a later extension, a macro based on a term rewriting system was introduced that allowed users to parameterize blocks with an arbitrary number of inputs and outputs [2].
This strong abstraction capability through formalization enables Faust to be translated into various backends, such as C, C++, Rust, and LLVM IR. On the other hand, Faust’s Block Diagram Algebra (BDA) lacks theoretical and practical compatibility with common programming languages. Although it is possible to call external C functions in Faust, these functions are assumed to be pure functions that do not have internal states. Therefore, while it is easy to embed Faust in another language, it is not easy to call another language from Faust.
In addition, a macro for Faust is an independent term rewriting system that generates a BDA based on pattern matching. Consequently, the numeric arguments for pattern matching are implicitly required to be integers, which can sometimes lead to compile-time errors despite the fact that BDA does not distinguish between real and integer types. However, the implicit typing rules are not intuitive for novice users.
Proposing a computational model for signal processing based on more generic computational models such as lambda calculus has the potential to enable interoperability between many different general-purpose languages and facilitate the appropriation of existing optimization methods and the implementation of compilers and runtimes.
It has been demonstrated that BDA can be converted into a general-purpose functional language using an arrow, which is a higher-level abstraction of monads [3]. However, higher-order functions in general-purpose functional languages are often implemented using dynamic memory allocation and deallocation, making them difficult to use in host languages designed for real-time signal processing.
In addition, Kronos [4] and W-calculus [5] are examples of lambda calculus-based abstractions influenced by Faust. Kronos is based on the theoretical foundation of the System-$F\omega$, a variation of lambda calculus in which the types themselves can be abstracted (i.e., a function that takes a type as input and returns a new type can be defined). In Kronos, type calculations correspond to signal graph generation, whereas the value calculations correspond to actual processing. Delay is the only special primitive operation in Kronos, and feedback routing can be represented as a recursive function application in type calculations.
W-calculus includes feedback as a primitive operation, along with the ability to access the value of a variable from the past (i.e., delay). W-calculus restricts systems to those that can represent linear-time-invariant processes, such as filters and reverberators, and defines more formal semantics, aiming for automatic proofs of linearity and the identity of graph topologies.
Previously, the author designed the music programming language mimium [6]. By incorporating basic operations such as delay and feedback into the lambda calculus, signal processing can be concisely expressed while maintaining a syntax similar to that of general-purpose programming languages. Notably, mimium’s syntax was designed to resemble the Rust programming language.
An earlier issue with mimium was its inability to compile code that contained combinations of recursive or higher-order functions with stateful functions involving delay or feedback because the compiler could not determine the data size of the internal state used in signal processing.
In this paper, I propose the syntax and semantics of $\lambda_{mmm}$, an extended call-by-value simply typed lambda calculus, as a computational model intended to serve as an intermediate representation for mimium1. In addition, I propose a virtual machine and its instruction set, based on Lua’s VM, to execute this computational model in practice. Finally, I discuss both the challenges and potential of the current model, one of which is that users must differentiate whether a calculation occurs in a global context or during actual signal processing; the other is that runtime interoperability with other programming languages could be easier than in existing DSP languages.
Syntax
The types and terms of $\lambda_{mmm}$ are presented in Figure 1.
Two terms are introduced in addition to the standard simply typed lambda calculus: $delay\ n\ e_1\ e_2$, which refers to the previous value of $e_1$ by $e_2$ samples (with a maximum delay of $n$ to limit memory usage to a finite size), and $feed\ x.e$, an abstraction that allows the user to refer to the result of evaluating $e$ from one time unit earlier as $x$ during the evaluation of $e$ itself.
Syntactic Sugar of the Feedback Expression in mimium
The programming language mimium, developed by the author, includes a keyword $self$ that can be used in function definitions to refer to the previous return value of the function. An example of a simple one-pole filter function, which mixes the input and last output signals such that the sum of the input and feedback gains is 1, is shown in Listing 1. This code can be expressed in as illustrated in Figure 2.
Typing Rules
Additional typing rules for typical simply typed lambda calculus are shown in Figure 3.
The primitive types include a real number type, used in most signal processing, and a natural number type, used for the indices of delay.
In W-calculus, which directly inspired the design of $\lambda_{mmm}$, the function types can only take tuples of real numbers and return tuples of real numbers. This restriction prevents the definition of higher-order functions. While this limitation is reasonable for a signal processing language—since higher-order functions require data structures such as closures that depend on dynamic memory allocation—it also reduces the generality of lambda calculus.
In $\lambda_{mmm}$, the problem of memory allocation for closures is delegated to runtime implementation (see Section 4), which allows the use of higher-order functions. However, $feed$ abstraction does not permit function types to be either input or output. Allowing function types in the $feed$ abstraction enables the definition of functions whose behavior could change over time. While this is theoretically interesting, there are no practical examples in real-world signal processing, and such a feature would likely further complicate the implementation.
Semantics
An excerpt of the operational semantics for $\lambda_{mmm}$ is shown in Figure 4. This big-step semantics conceptually explains the evaluation process. When the current time is $n$, the evaluation environment from $t$ prior samples can be referred to as $E^{n-t}$. If the time is less than 0, any term is evaluated as the default value of its type (0 for numeric types).
Naturally, if we attempt to execute these semantics directly, we would need to recalculate from time 0 to the current time for every sample, saving all the variable environments at each step. However, in practice, a virtual machine is defined to account for the internal memory space used by $delay$ and $feed$, and terms are compiled into instructions for this machine before execution.
VM Model and Instruction Set
The virtual machine (VM) model and its instruction set for running $\lambda_{mmm}$ are based on Lua version 5 VM [7].
A key challenge when executing a computational model based on lambda calculus is handling the data structure, which is known as a closure. A closure captures the variable environment in which the inner function is defined, allowing it to refer to the variables from the outer function’s context. If the inner function is paired with a dictionary of variable names and values, the compiler (or interpreter) implementation is straightforward; however, the runtime performance is limited.
Conversely, the runtime performance can be improved using a process called closure conversion (or lambda lifting). This process analyzes all the outer variables referenced by the inner function and transforms the inner function by adding arguments; thus, the outer variables can be referred to explicitly. However, the implementation of this transformation in the compiler is relatively complex.
The Lua VM adopts a middle-ground approach between these two methods by adding the VM instructions GETUPVALUE
and SETUPVALUE
, which allow the outer variables to be dynamically referenced at runtime. The implementation of the compiler and VM using upvalues is simpler than full closure conversion while still avoiding significant performance degradation. In this approach, the outer variables are accessed via the call stack rather than the heap memory unless the closure escapes the context of the original function [8].
In addition, upvalues facilitate interoperability with other programming languages. Lua can be easily embedded through its C API, and when implementing external libraries in C, programmers can access the upvalues of the Lua runtime, not just the stack values available via the C API.
Instruction Set
The VM instructions for $\lambda_{mmm}$ differ from those for the Lua VM in the following aspects:
- Since mimium is a statically typed language, unlike Lua, instructions for basic arithmetic operations are provided for each type2.
- The call operation is split into normal function calls and closure calls owing to the static typing and to manage higher-order stateful functions (see 4.2 for details).
- Conditional statements are implemented using a combination of two instructions,
JMP
andJMPIFNEG
, whereas the Lua VM employs a dedicatedTEST
instruction. - Instructions related to for-loops, the
SELF
instruction used in object-oriented programming, and theTABLE
-related instructions for metadata references to variables are omitted in mimium as they are unnecessary. - Instructions related to list-like data structures are also excluded from this paper, as the implementation of data structures such as tuples and arrays is outside the scope of the $\lambda_{mmm}$ description here.
The VM for $\lambda_{mmm}$ operates as a register machine similar to the Lua VM (post version 5). However, unlike traditional register machines, it does not employ physical registers. Instead, the register number simply refers to an offset index on the call stack relative to the base pointer during VM execution. The first operand of most instructions specifies the register number where the result of the operation is stored.
A list of instructions is presented in Figure 4 (basic arithmetic operations are partially omitted). The notation for the instructions follows the format outlined in the Lua VM documentation [7, p. 13]. The operation name, list of operands, and pseudocode of the operation are displayed from left to right. When each of the three operands is used as an unsigned 8-bit integer, it is represented as A B C
. If an operand is used as a signed integer, then it is prefixed with s
. When the two operand fields are combined into a 16-bit value, the suffix x
is added. For example, when B
and C
are merged and treated as a signed 16-bit value, they are represented as sBx
.
In the pseudocode, R(A)
denotes the data being moved in and out of the register (or call stack) at the base pointer + A
for the current function. K(A)
refers to the A
-th entry in the static variable section of the compiled program, and U(A)
accesses the A
-th upvalue of the current function.
In addition to Lua’s upvalue operations, four new operations— GETSTATE
, SETSTATE
, SHIFTSTATE
, and DELAY
—have been introduced to handle the compilation of the $delay$ and $feed$ expressions in $\lambda_{mmm}$.
Overview of the VM Structure
The overall structure of the virtual machine program, and instantiated closures for $\lambda_{mmm}$ is depicted in Figure 5. In addition to the usual call stack, the VM has a dedicated storage area (a flat array) to manage the internal state data for feedback and delay.
This storage area is accompanied by pointers that indicate the positions from which the internal state data are retrieved via the GETSTATE
and SETSTATE
instructions. These positions are shi-fted forward or backward using the SHIFTSTATE
instruction. The actual data layout in the state storage memory is statically determined during compilation by analyzing function calls involving references to self
, delay
, and other stateful functions, including those that recursively invoke such functions. The DELAY
operation takes two inputs: B
, representing the input value, and C
, representing the delay time in the samples.
However, for higher-order functions—functions that take another function as an argument or return one—the internal state layout of the passed function is unknown at compile time. Consequently, a separate internal state storage area is allocated to each instantiated closure, which is distinct from the global storage area maintained by the VM instance. The VM also uses an additional stack to keep track of the pointers in the state storage of instantiated closures. Each time a CALLCLS
operation is executed, the VM pushes the pointer from the state storage of the closure onto the state stack. Upon completing the closure call, the VM pops the state pointer off the stack.
Instantiated closures also maintain their own storage areas for upvalues. Until a closure exits the context of its parent function (known as an “Open Closure”), its upvalues hold a negative offset that references the current execution’s stack. This offset is determined at compile time and stored in the function’s prototype in the program. Furthermore, an upvalue may reference not only local variables but also upvalues from the parent function (a situation that arises when at least three functions are nested). Thus, the array of upvalue indices in the function prototype stores a pair of values: a tag indicating whether the value is a local stack variable or an upvalue from a parent function and the corresponding index (either the negative stack offset or the parent function’s upvalue index).
For example, consider a scenario where the upvalue indices in the program are specified as [upvalue(1), local(3)]
. In this case, the instruction GETUPVALUE 6 1
indicates that the value located at index 3
from the upvalue list (referenced by upvalue(1)
) should be retrieved from R(-3)
relative to the base pointer, and the result should be stored in R(6)
.
When a closure escapes its original function context through the RETURN
instruction, the inserted CLOSE
instruction moves the active upvalues from the stack to heap memory. These upvalues may be referenced from multiple locations, particularly in cases involving nested closures. Thus, a garbage collection mechanism is required to free memory once these upvalues are no longer in use.
In $\lambda_{mmm}$’s VM, since the paradigm is call-by-value and there is no reassignment expression, the SETUPVALUE
instruction is omitted. If reassignment is allowed, open upvalues would need to be implemented as shared memory cells, as the values might be accessed by multiple closures that could trigger a CLOSE
operation.
Compilation to the VM Instructions
Listing 2 shows a basic example of how the mimium code in Listing 1 is compiled into VM bytecode. When self
is referenced, the value is retrieved using the GETSTATE
instruction, and the internal state is updated by storing the return value with the SETSTATE
instruction before returning it via the RETURN
instruction. In this case, the actual return value is obtained using the second GETSTATE
instruction, which ensures that the initial state value is returned at time = 0.
For example, if a time counter is written as $feed x. x+1$, the decision on whether the return value at time = 0 should be 0 or 1 is left to the compiler design. Although returning 1 does not strictly follow the semantics of E-FEED in Figure 4, if the compiler is designed to return 1 at time = 0, the second GETSTATE
instruction can be omitted, and the value for the RETURN
instruction should be R(2)
.
A more complex example, along with its expected bytecode instructions, is shown in Listings 3 and 4. The code defines a delay with feedback as fbdelay
, while another function, twodelay
, uses two feedback delays with different parameters. Finally, dsp
uses two twodelay
functions.
After each reference to self
through the GETSTATE
instruction or after calling another stateful function, the SHIFTSTATE
instruction is inserted to advance the state storage position in preparation for the next non-closure function call. Before the function exits, the state position is reset to where it was at the beginning of the current function context using the SHIFTSTATE
instruction. The total operand value for SHIFTSTATE
within a function must always sum to 0. Figure 7 illustrates how the state position shifts with the SHIFTSTATE
operations during the execution of the twodelay
function. The argument for the SHIFTSTATE
operation is a word size (a number of 64 bit values) and the word size for delay is maximum delay time + 3
since the read index, write index and the length of the ring buffer are added.
The state data can be stored as a flat array by representing the internal state as a relative position within the state storage, thereby simplifying compiler implementation; this avoids the need to generate a tree structure from the root, which was required in the previous implementation of mimium. This approach is similar to how upvalues simplify the compiler implementation by treating free variables as relative positions on the call stack.
Listing 5 shows an example of a higher-order function filterbank
, which takes another function filter
—accepting an input and a frequency as arguments—duplicates n
instances of filter
and adds them together3.
The previous mimium compiler was unable to compile code that took a function with an internal state as an argument because the entire tree of internal states had to be statically determined at compile time. However, the VM in $\lambda_{mmm}$ can handle this dynamically. Listing 6 shows the translated VM instructions for this code. Recursive calls on the first line of filterbank
, as well as calls to functions passed as arguments or obtained through upvalues (like filter
), are executed using the CALLCLS
instruction rather than the CALL
instruction. The GETSTATE
and SETSTATE
instructions are not used in this function because the internal state storage is switched dynamically when the CALLCLS
instruction is interpreted.
Discussion
As demonstrated in the example of the filterbank, in $\lambda_{mmm}$, a signal graph can be parametrically generated during the evaluation of the global context, whereas Faust uses a term-rewriting macro and Kronos employs type-level computation, as shown in Table 1.
The ability to describe both the generation of parametric signal processing and its execution content within single semantics makes it easier for novice users to understand the mechanics of the language. In addition, unified semantics may simplify runtime interoperability with other general-purpose languages.
However, there is a drawback: unified semantics can cause $\lambda_{mmm}$ to deviate from the behavior typically expected in standard lambda calculus.
Parametric Signal Graph | Actual DSP | |
---|---|---|
Faust | Term Rewriting Macro | BDA |
Kronos | Type-level Computation | Value Evaluation |
mimium | Global Context Execution | dsp Function Execution |
Table 1. Comparison of the way of signal graph generation and actual signal processing between Faust, Kronos and $\lambda_{mmm}$.
Different Behaviour Depending on the Location of Let Binding
By using functions with internal states that change over time in mimium, there is counterintuitive behavior when higher-order functions are used compared to general functional programming languages.
Listing 7 presents an example of incorrect code that is slightly modified from the filterbank example in Listing 5. The main difference between Listing 7 and Listing 5 is whether the recursive calls in the filterbank
function are written directly or bound using a let
expression outside the inner function. Similarly, in the dsp
function, which is called by the audio driver in mimium, the difference lies in whether the filterbank
function is executed within dsp
or bound with let
once in the global context.
In a typical functional programming language, if none of the functions in the composition involve destructive assignments, the calculation process remains unchanged even if the variable bound by let
is replaced with its term (via beta reduction), as seen in the transformation from Listing 7 to Listing 5.
However, in mimium, there are two distinct stages of evaluation. 0: The code is first evaluated in a global environment (where the signal-processing graph is concretized). 1: The dsp
function is executed repeatedly (handling the actual signal processing) and may involve implicit updates to the internal states.
Although the code contains no destructive assignments, the recursive execution of the filterbank
function occurs only once in Listing 5 during the global environment evaluation. Conversely, in Listing 7, the recursive function is executed, and a closure is generated each time the dsp
function runs on every sample. Because the internal state of the closure is initialized at the time of closure allocation, in the example of Listing 7, the internal state of the closure is reset at each time step, following the evaluation of filterbank
.
This implies that major compiler optimization techniques, such as constant folding and function inlining, cannot be directly applied to mimium. These optimizations must be performed after global context evaluation and before the evaluation of the dsp
function.
To address this issue, it is necessary to introduce a distinction in the type system to indicate whether a term should be used during global context evaluation (stage 0) or actual signal processing (stage 1). This can be achieved with Multi-Stage Computation [9]. Listing 8 provides an example of the filterbank
code using BER MetaOCaml’s syntax: .<term>.
, which generates a program to be used in the next stage, and ~term
, which embeds the terms evaluated in the previous stage [10].
The filterbank
function is evaluated in stage 0 while embedding itself with ~
. In contrast to Faust and Kronos, this multi-stage computation code retains the same semantics for both the generation of the signal processing graph and the execution of signal processing.
A Possibility of the Foreign Stateful Function Call
The closure data structure in combines functions with the internal states, as shown in Figure 3. The fact that filterbank
samples do not require special handling for internal states means that external signal processors (Unit Generators: UGens), such as oscillators and filters written in C or C++, can be called from mimium, just like normal closure calls. Additionally, it is possible to parameterize, duplicate, and combine external UGens4. This capability is difficult to implement in Faust and similar languages but is easily achievable in the paradigm.
However, mimium currently uses sample-by-sample processing and cannot handle buffer-by-buffer value passing. Because most native unit generators process data on a buffer-by-buffer basis, there are few practical cases where external UGens are currently used. Nonetheless, in , only $feed$ terms require sample-by-sample processing. Therefore, it is possible to differentiate the functions that can process only one sample at a time from those that can process concurrently at the type level. As the multi-rate specification is being considered in Faust [11], it may be possible to facilitate buffer-based processing between an external Unit Generator by having the compiler automatically determine the parts that can be processed buffer-by-buffer.
Conclusion
This paper proposed $\lambda_{mmm}$, an intermediate representation for programming languages for music and signal processing, along with a virtual machine and an instruction set to execute it. $\lambda_{mmm}$ enables the description of generative signal graphs and their contents within a unified syntax and semantics. However, users are responsible for ensuring that their code does not create escapable closures during the iterative execution of a DSP, which can be challenging for novice users to grasp.
In this paper, the translation of terms into VM instructions was illustrated by showing examples of code and the corresponding expected bytecode alongside pseudocode to describe the behavior of the VM. More formal semantics and a detailed translation process should be considered, particularly with the introduction of multi-stage computation.
I hope that this research will contribute to more general representations of music and sound on digital computers and foster deeper connections between the theory of languages for music and the broader field of programming language theory.
Acknowledgments
This study was supported by JSPS KAKENHI (Grant No.23K12059). I would also like to thank the many anonymous reviewers.
References
- [1] Y. Orlarey, D. Fober, and S. Letz, “Syntactical and semantical aspects of Faust,” Soft Computing, vol. 8, no. 9, pp. 623–632, 2004, doi: 10.1007/s00500-004-0388-1.
- [2] A. Gräf, “Term Rewriting Extension for the Faust Programming Language,” in International Linux Audio Conference, 2010. Available: [https://hal.archives-ouvertes.fr/hal-03162973 https://hal.archives-ouvertes.fr/hal-03162973/document](https://hal.archives-ouvertes.fr/hal-03162973 https://hal.archives-ouvertes.fr/hal-03162973/document)
- [3] B. R. Gaster, N. Renney, and T. Mitchell, “OUTSIDE THE BLOCK SYNDICATE: TRANSLATING FAUST’S ALGEBRA OF BLOCKS TO THE ARROWS FRAMEWORK,” in Proceedings of the 1st International Faust Conference, Mainz,Germany, 2018.
- [4] V. Norilo, “Kronos: A Declarative Metaprogramming Language for Digital Signal Processing,” Computer Music Journal, vol. 39, no. 4, pp. 30–48, 2015, doi: 10.1162/COMJ_a_00330.
- [5] E. J. G. Arias, P. Jouvelot, S. Ribstein, and D. Desblancs, “The [W-calculus]{.nocase}: A Synchronous Framework for the Verified Modelling of Digital Signal Processing Algorithms,” in Proceedings of the 9th ACM SIGPLAN international workshop on functional art, music, modelling, and design, New York, NY, USA: Association for Computing Machinery, 2021, pp. 35–46. doi: 10.1145/3471872.3472970.
- [6] T. Matsuura and K. Jo, “Mimium: A self-extensible programming language for sound and music,” in Proceedings of the 9th ACM SIGPLAN International Workshop on Functional Art, Music, Modelling, and Design, in FARM 2021. New York, NY, USA: Association for Computing Machinery, Aug. 2021, pp. 1–12. doi: 10.1145/3471872.3472969.
- [7] R. Ierusalimschy, L. H. de Figueiredo, and W. Celes, “The Implementation of Lua 5.0,” JUCS - Journal of Universal Computer Science, vol. 11, no. 7, pp. 1159–1176, Jul. 2005, doi: 10.3217/jucs-011-07-1159.
- [8] R. Nystrom, Crafting Interpreters. Daryaganj Delhi: Genever Benning, 2021.
- [9] W. Taha and T. Sheard, “Multi-Stage Programming with Explicit Annotations,” SIGPLAN Notices (ACM Special Interest Group on Programming Languages), vol. 32, no. 12, pp. 203–214, Dec. 1997, doi: 10.1145/258994.259019.
- [10] O. Kiselyov, “The Design and Implementation of BER MetaOCaml,” in [Proceedings of the 12th International Symposium on Functional and Logic Programming]{.nocase}, M. Codish and E. Sumii, Eds., Cham: Springer International Publishing, 2014, pp. 86–102. doi: 10.1007/978-3-319-07151-0_6.
- [11] P. Jouvelot and Y. Orlarey, “Dependent vector types for data structuring in multirate Faust,” Computer Languages, Systems & Structures, vol. 37, no. 3, pp. 113–131, 2011.
The newer version of mimium compiler and VM based on the model presented in this paper is on the GitHub. https://github.com/tomoyanonymous/mimium-rs ↩︎
In the actual implementation, instructions such as
MOVE
include an additional operand to specify the word size of values, particularly for handling aggregate types like tuples. ↩︎In the previous specification of mimium [6], the syntax for the variable binding and destructive assignment was the same (
x = a
). However, in the current syntax, variable binding uses thelet
keyword. ↩︎In fact, in the actual implementation of mimium, an interoperation between VM and audio driver is realized by passing and calling Rust’s closure. ↩︎