A self-descriptive stack-based PC platform
git clone git://git.luxferre.top/equi.git
Log | Files | Refs | README | LICENSE

README.md (27544B)

      1 # Equi
      3 Equi is a general-purpose 16-bit stack-based platform (and a programming language/VM named the same) aimed at low-cost, low-energy computing. It was inspired by Forth, Uxn, VTL-2, SIMPL and some other similar projects.
      5 The name Equi comes from the fact each source code instruction is **equi**valent to a machine instruction. No, it isn't mapped to one machine instruction. It **is the** machine instruction. All the instructions and data in Equi are represented with printable ASCII characters only. This allows to bootstrap Equi code directly from the keyboard (any standard keyboard/keypad that allows serial input) using a tiny interpreter stored, for instance, in the hardware ROM. 
      7 This document describes a more-or-less formal specification. A tutorial book on how to use it is a work-in-progress. 
      9 ## Specification
     11 Main features of an Equi machine:
     13 - Instruction bus: 8-bit;
     14 - Data bus: 16-bit;
     15 - Address bus: 16-bit;
     16 - Up to 65536 bytes of RAM;
     17 - Up to 64 MiB flat persistent storage (tape, disk, flash etc);
     18 - Serial terminal input and output;
     19 - Up to 65535 peripheral extension ports, including several virtual ports;
     20 - Multitasking support with up to 8 concurrently running tasks (by default);
     21 - Two 256-byte (128-word) stacks, main and return, per task;
     22 - One 32-byte literal stack per task;
     23 - 16-bit input buffer pointer and global and individual mode flags.
     25 The default Equi RAM layout is:
     27 Size (bytes)|Purpose
     28 ------------|---------------------
     29 2           |Main/return stack size in words
     30 1           |Literal stack size in bytes (up to 255)
     31 2           |Command buffer start address
     32 2           |Command buffer size in bytes
     33 2           |IBP - input buffer pointer
     34 1           |II - instruciton ignore mode flag
     35 1           |MM - minification/bypass pseudo-mode flag
     36 2           |Currently running task ID
     37 varies      |Task context table (see the next layout)
     38 varies      |Command buffer area
     40 And the Equi program task context layout in the task context table is:
     42 Size (bytes)|Purpose
     43 ------------|------------
     44 2           |Task ID
     45 1           |Active flag
     46 1           |Privileged flag
     47 1           |CM - compilation mode flag
     48 1           |LSP - literal stack pointer
     49 2           |MSP - main stack pointer
     50 2           |RSP - return stack pointer
     51 2           |CLTP - compilation lookup table pointer
     52 2           |CBP - compilation buffer pointer
     53 2           |Task's GPD start address
     54 2           |Task's command buffer start address
     55 2           |Task's command buffer length in bytes
     56 2           |PC - program counter
     57 varies      |Main stack
     58 varies      |Return stack
     59 varies      |Literal stack
     60 varies      |Compilation lookup table
     61 varies      |General purpose data (GPD) area
     64 Equi is strictly case-sensitive: all uppercase basic Latin letters, as well as a number of special characters, are reserved for machine instructions, and all custom words must be defined in lowercase only (additionally, `_` character is allowed in the identifiers). Within comments (see below), any characters can be used.
     66 All whitespace characters (space, tabulation, CR or LF) are discarded in Equi upon loading the program and can be used for code clarity any way the author wants.
     68 The interpreter can run in one of the four modes: command (default), interpretation (IM), compilation (CM) and instruction ignore (II) mode. An Equi machine always starts in the command mode. The latter three are triggered by certain instructions that set the corresponding flags. The semantics of the compilation mode is similar to that of Forth, and will be covered in detail here later on.
     70 In the command mode, the interpreter doesn't perform any instruction execution and doesn't manipulate program counter (PC). Instead, it accumulates all characters typed from the standard input into the so-called command buffer. The only instruction Equi must react to in this mode is Q, the quit instruction, that loads the currently input command buffer contents into a task context and starts its execution in the interpretation mode. Note that this also means that every Equi program file, even when run in a non-interactive environment, must end with a Q character, and as long as every program has a halting `Q` instruction, you can safely concatenate several Equi programs in a single file to be executed sequentially.
     72 In the instruction ignore more (II flag set), all instructions or arbitrary characters except `)` (that unsets the II flag), are skipped and discarded. This can be used to write comments. In a well-formed Equi program, the characters braced in the II instructions `(` and `)`, as well as any whitespace characters, will never enter the command buffer upon loading.
     74 In the interpretation mode, when the interpreter encounters any of the following characters - `_0-9A-Fa-z` (not including `-`) - it pushes their ASCII values bytewise onto the literal stack (32-byte long). When any other character (except `:`, `"` or `'`) is encountered when the literal stack is not empty, the `#` instruction logic (see below) is performed automatically. If `:` is encountered, compilation mode logic is performed instead. If a `Q` instruction or a on-printable character is encountered, Equi returns to the command mode immediately.
     76 In the compilation mode, all instructions except `;` are skipped while the CM flag is set. When the interpreter encounters `;` instruction, it performs the finalizing logic to save the compiled word into CLT (see below) and returns to the interpretation mode.
     78 Equi's core instruction set is:
     80 Op |Stack state                     |Meaning
     81 ---|--------------------------------|----------------------------------------------------------
     82 `#`|`( -- )`                        |Literal: pop all characters from the literal stack, discard all `_a-z` characters, leave the top 4 characters (replacing the missing ones with 0) and push the 16-bit value from them (in the order they were pushed) onto the main stack
     83 `"`|`( -- lit1 lit2 ... )`          |Pop all the values from the literal stack and push them onto the main stack as 16-bit values
     84 `(`|`( -- )`                        |Set the II flag: when it is set, the interpreter must ignore all instructions except `)`, used for writing comments
     85 `)`|`( -- )`                        |Unset the II flag, returning to the normal interpretation or compilation mode
     86 `:`|`( -- )`                        |Compilation mode start: set CM flag and set CBP to PC+1 value
     87 `;`|`( -- )`                        |Compilation mode end: replace this instruction in-memory with `R` instruction, pop all characters from the literal stack, append the lookup table with their CRC16 hash and CBP value, unset the CM flag and increment CLTP value
     88 `'`|`( -- )`                        |Call the compiled word: pop all characters from the literal stack, compute their CRC16 hash, look it up in CLT for a CBP value, set PC to CBP if found, error out if not, then push PC to return stack and set PC to the CBP value
     89 `R`|R: `( a -- )`                   |**R**eturn: pop and assign the PC value from the return stack
     90 `]`|M: `( a -- )` R: `( -- a )`     |Pop the value from main stack and push onto return stack
     91 `[`|M: `( -- a )` R: `( a -- )`     |Pop the value from return stack and push onto main stack
     92 `L`|`( addr -- a` )                 |**L**oad a 16-bit value from `addr`
     93 `S`|`( a addr -- )`                 |**S**tore a 16-bit value into `addr`
     94 `W`|`( a addr -- )`                 |**W**rite a 8-bit value into `addr` (note that both value and address still must be 16-bit, the higher byte of the value is discarded)
     95 `!`|`( a -- )`                      |Drop the top value from the stack
     96 `$`|`( a -- a a )`                  |Duplicate the top value on the stack
     97 `%`|`( a b -- b a )`                |Swap top two values on the stack
     98 `@`|`( a b c -- b c a )`            |Rotate top three values on the stack
     99 `\`|`( a b -- a b a )`              |Copy over the second value on the stack
    100 `J`|`( rel -- )`                    |**J**ump: increase or decrease PC according to the relative value (treated as signed, from -32768 to 32767)
    101 `I`|`( cond rel -- ) `              |Pop relative value and condition. **I**f the condition value is not zero, `J` to the relative value
    102 `X`|`( -- pc )`                     |Locate e**X**ecution point: push PC+1 value onto the main stack
    103 `G`|`( -- gpd_start )`              |Locate **G**PD area start: push its flat offset onto the main stack
    104 `>`|`( a b -- a>b )`                |Push 1 onto the stack if the second popped value is greater than the first, 0 otherwise
    105 `<`|`( a b -- a>b )`                |Push 1 onto the stack if the second popped value is less than the first, 0 otherwise
    106 `=`|`( a b -- a==b )`               |Push 1 onto the stack if the two popped values are equal, 0 otherwise
    107 `+`|`( a b -- a+b )`                |Sum
    108 `-`|`( a b -- a-b )`                |Difference
    109 `*`|`( a b -- a*b )`                |Product
    110 `/`|`( a b -- a/b rem )`            |Integer division (with remainder)
    111 `N`|`( a -- -a )`                   |Single-instruction negation (complement to 65536)
    112 `T`|`( a XY -- [a >> X] << Y )`     |Bitwise shif**t**: by the first nibble to the right and then by the second nibble to the left
    113 `~`|`( a -- ~a )`                   |Bitwise NOT
    114 `&`|`( a b -- a&b )`                |Bitwise AND
    115 `\|`|`( a b -- a\|b )`              |Bitwise OR
    116 `^`|`( a b -- a^b )`                |Bitwise XOR
    117 `.`|`( a -- ) `                     |Output a character by the ASCII (or Unicode, if supported) value into the standard terminal
    118 `H`|`( a -- ) `                     |Output the hexadecimal 16-bit value from the stack top into the standard terminal
    119 `,`|`( -- a ) `                     |Non-blocking key input of an ASCII (or Unicode, if supported) value from the standard terminal
    120 `?`|`( -- a ) `                     |Blocking key input of an ASCII (or Unicode, if supported) value from the standard terminal
    121 `P`|`( p1 p2 port -- r1 r2 status )`|**P**ort I/O: pass two 16-bit parameters to the port and read the operation status and results into the words on the stack top
    122 `}`|`( blk len maddr -- status)`    |Persistent storage write operation. Stack parameters: block number (x1K), data length, RAM address
    123 `{`|`( blk len maddr -- status)`    |Persistent storage read operation. Stack parameters: block number (x1K), data length, RAM address
    124 `Y`|`( addr len priv -- taskid )`   |Fork an area from the command buffer starting at `addr` into a new task, activate it (see below) and push the task ID onto the stack 
    125 `Q`|`( -- )`                        |**Q**uit the interpretation mode (unset IM flag if set), or the interpreter shell itself if in command mode (halt the machine when it's nowhere to exit to)
    127 Note that, due to the dynamic nature of word allocation and ability to reconfigure the runtime environment for different offsets depending on the target, absolute jumps are not directly supported in Equi and generally not recommended, although one can easily do them with `]R` sequence and/or calculate absolute positions using `X` instruction.
    129 Please also note that Equi doesn't specify any graphical or sound output capabilities. If such support is required, it generally must be implemented, as with any other peripheral, via the port I/O interface (`P`) instruction specific to a particular hardware/software implementation. Same goes for how standard serial terminal input/output is processed: Equi specification doesn't enforce any particular way. On the desktop/laptop PCs, however, it is advised, especially for software-based implementations/VMs, that the terminal I/O should be VT100-compatible, including, for instance, control character support and the output of an audiovisual bell for ASCII 0x07 (`\a` or `^G`). Depending on the target, these features may already be supported by the underlying OS's terminal emulator or may be implemented as a part of the VM itself.
    131 See [FizzBuzz](examples/fizzbuzz.equi) for a more thorough example of how different features of the current Equi specification are used.
    133 ## Reference implementation
    135 Being a purely PC-oriented low-level runtime/programming environment, Equi has the reference implementation emulator/VM written in C (ANSI C89 standard), `equi.c`, compilable and runnable on all the systems supporting standard I/O. Note that, for portability reasons, this emulator:
    137 - accepts the program from a single file at a time only,
    138 - only implements four ports for `P` instruction: 0 as an echo port (returns passed parameters as corresponding result values), 1 as a random port (returns two random values in the results in the range between the two parameter values) 2 as a CRC16 calculation port for a given memory location and its length, and 3 for task control (see below), for any other port value it outputs its parameters to the standard error stream and puts three 0x0000 values back onto the stack,
    139 - implements `s` command line parameter that runs the emulator in the silent mode without printing any welcome banners or interactive prompts,
    140 - sandboxes the `{` and `}` operations using the file with the name you supply on the compile time to the `PERSIST_FILE` constant. The file must already be created and accessible. If it doesn't exist, these operations will effectively do nothing except putting 0x0000 (success status) onto the stack.
    142 Additionally, this emulator implements `m` command line parameter that means that, instead of execution, the VM shall output the current command buffer contents upon reaching the `Q` instruction. This is particularly useful to save minified versions of `.equi` files to further reuse them in more space-restricted environments. Note that minified and non-minified files load and run fully identically, but the size difference can be significant. I.e. for the current FizzBuzz example version, the source is 1544 bytes long but its actual application snapshot in the command buffer (which can be dumped with the `m` parameter as a minified variant) is just [180 bytes long](examples/fizzbuz.min.equi). The rest is comments and whitespace characters that are skipped while loading the program into the command buffer.
    144 The source code file should compile using any mainstream C compiler with C89 support, like GCC/DJGPP, Clang, TCC etc. However, it is also being developed to be compilable with CC65 compiler for targets like Apple II or Atari 800. All the machine/target specific configuration is done at compile time, using compiler command-line switches. Here are the instructions to build Equi using different known C compilers.
    146 The following constants can be adjusted at compile time:
    148 - `STACK_SIZE` - main and return stacks size in bytes (65535 max);
    149 - `LIT_STACK_SIZE` - literal stack size in bytes (255 max);
    150 - `GPD_AREA_SIZE` - GPD area size in bytes;
    151 - `CMD_BUF_SIZE` - command buffer size in bytes (65535 max);
    152 - `CLT_ENTRIES_MAX` - size (in entries) of the compilation lookup table (CLT), each entry taking exactly 4 bytes;
    153 - `PERSIST_FILE` - the name of persistent storage sandbox file (`PERS.DAT` by default);
    154 - `EQUI_TASKS_MAX` - maximum amount of concurrently running tasks on the system.
    156 Please keep in mind that the reference implementation code primarily serves as a, well, reference on how the specification should be implemented, so it emphasizes on code portability and readability over performance whenever such a choice arises.
    158 The project Makefile, provided for convenience, supports passing these constants with `-DFLAGS="..."` switch. Below are the steps to build Equi without a Makefile from the `equi.c` source file alone, with a corresponding `make` target specified as well.
    160 ### Building with GCC/Clang/MinGW (for current mainstream targets): `make`
    162 Build with default parameters (you can override any of the above constants with `-D` switch:
    164 ```
    165 cc -std=c89 -Os -o equi equi.c [-DSTACK_SIZE=... ...]
    166 ```
    168 ### Building with TCC (TinyCC, Tiny C Compiler): `make tcc`
    170 Equi's codebase detects TCC and attempts to save size by linking against tcclib instead of the standard libraries. Note that TCC doesn't support size optimization switches and C89 standard in the most recent versions, so it will fall back to C99 instead. Anyway, the most sensible command to build Equi with TCC is:
    172 ```
    173 tcc -std=c89 -o equi equi.c [-DSTACK_SIZE=... ...]
    174 ```
    176 ### Building with CC65 for Enhanced Apple IIe: `make a2`
    178 This is where things start to get interesting, as we need to specify the exact target machine for CC65 and perform certain target-dependent post-build manipulation. For now, Equi reference implementation is only being tested for 65C02-based Enhanced Apple IIe (as the earliest model both supported by CC65 suite and supporting lowercase character I/O), so the command to build it would be:
    180 ```
    181 cl65 --standard c89 -O -Os -t apple2enh -o equi.a2enh [-DSTACK_SIZE=... ...] equi.c
    182 ```
    184 Then, if there are no compiler/linker errors, we can proceed with building the image (assuming we're using Java and AppleCommander with an empty 140K ProDOS 8 image bundled in the repo for image assembly):
    186 ```
    187 cp platform-build-tools/apple2/tpl.dsk equi.dsk
    188 java -jar platform-build-tools/apple2/ac.jar -p equi.dsk equi.system sys < $(cl65 --print-target-path)/apple2enh/util/loader.system
    189 java -jar platform-build-tools/apple2/ac.jar -as equi.dsk equi bin < equi.a2enh
    190 ```
    192 This will build a bootable disk image with Equi for Apple II that can be tested on emulators or real hardware.
    194 You can also add a 96K-sized `PERS.DAT` file shipped in the repo to use the persistent storage capabilities (done automatically with the Makefile target):
    196 ```
    197 java -jar platform-build-tools/apple2/ac.jar -dos equi.dsk PERS.DAT bin < platform-build-tools/PERS.DAT
    198 ```
    200 ## Multitasking in Equi
    202 Equi supports running several tasks concurrently scheduled instruction-by-instruction in a round-robin fashion. The general rules are as follows:
    204 1. Every task context has an ID, starting from 0 and ending with `EQUI_TASKS_MAX - 1`, and two specific attributes - `active` and `privileged`. The `active` attribute determines whether or not the task is running, the `privileged` attribute determines whether or not the task can write to the command buffer area not belonging to itself.
    205 2. The program code passed into Equi on start is loaded into task 0 and its `privileged` attribute is always set. This way, any code initially run in the machine can act as a loader and launcher for other tasks.
    206 3. A privileged task can spawn either another privileged task or non-privileged task. A non-privileged task can only spawn another non-privileged task.
    207 4. No task, whether privileged or not, can write into any RAM area outside its own GPD area and the command buffer. Non-privileged tasks are additionally limited to the command buffer area they already take and cannot write anywhere else.  
    208 5. When a task has ended, its `active` flag is unset. Equi runtime then may use its task slot to allocate another task when necessary.
    209 6. Equi machine halts/quits when no active task is left.
    211 New tasks are created (and instantly activated) with `Y` instruciton that accepts the code address, code length and privileged flag from the stack, and returns the task ID on top of the stack. Using this task ID, you can further control the status of the task using system port 3, passing the task ID as `p1` parameter and one of the following operation codes as `p2` parameter to the `P` instruction:
    213 - 0: get task status (active or not) as `r1`,
    214 - 1: set active status of the task (start/resume it) if your own task is privileged,
    215 - 2: unset active status of the task (pause/terminate it) if your own task is privileged,
    216 - 3: get the privilege status of the task as `r1`.
    218 See [this snippet](examples/multitask.equi) for a very simple example of using `Y` instruciton to allocate new tasks from existing code.
    220 ## FAQ
    222 ### Why does the world need another Forth-like system?
    224 Because it aims for a different set of goals than typical Forth systems, mainly to explore the realms of blurring the borders between source and machine code, and to create a VM that can be easily programmed with printable text on the lowest level with no assembly required. Equi is to a typical Forth what VTL-2 was to BASIC, except in this case it is much more capable and extensible at its core.
    226 ### What is the main niche for Equi? With a hard 16-bit address bus, is it a Uxn's competitor?
    228 No, not at all. Although Equi was partially inspired by Uxn, it aims for a totally different goal. Uxn was primarily designed for an esoteric computer, Varvara, with graphical, non-blocking input and sound capabilities in mind, and for compact **binary** machine code size, requiring preprocessing and assembly to obtain it. Equi was primarily designed for a more old-school serial terminal experience, and for machine code being readable and writable by humans at some expense of compactness. Still, FizzBuzz is 180 bytes in Equi when minified, and this size can be reduced even further by switching to single-character words and removing zeroes in hex literals where possible. And the resulting `.equi` file would still be readable by outputting its contents to a terminal, compared to 99-byte FizzBuzz in Uxn that would only have to be read in a hex viewer or via special disassembly tools.
    230 ### I want to use Equi programs in a relatively modern POSIX environment as a part of a scripted process. Is this possible?
    232 Totally! The Makefile for the reference implementation includes sensible default parameters for all targets. Just call `cat program.equi | /path/to/equi - s | [other program]` in your scripts, where `s` parameter is used to suppress all banners and prompts and terminal initialization code from the standard output stream. Just make sure to place `PERS.DAT` file in the appropriate place if you need the persistence capabilities in your Equi-based scripts, and not use input instructions in your programs if unsure what they will do with the streamed input. You can, of course, call Equi programs in a usual way just as well, with `/path/to/equi /path/to/program.equi s`, for instance.
    234 ### Too few core instructions! There still are lots of unused uppercase Latin letters, why not utilise them?
    236 Yes, Equi was designed to be useable from a standard keyboard but this doesn't mean every possible letter should be covered by an instruction. Implementation complexity should be kept low. Besides, new core features not present in every target system are much more convenient to implement via port I/O mechanism.
    238 ### Too many core instructions! E.g. `-` can be easily replaced with `N+`, and all bitwise operations can be done using NAND or NOR alone!
    240 While Equi definitely is a minimalist runtime, it's not limited to a 16- or 32-instruction set and tries to keep the balance between simplicity of implementation and simplicity of usage (as far as it can go for a machine-level language). Omitting too many primitive operations would require programmers to paste more instructions instead of one or define them as custom words where it would be totally unnecessary. That being said, Equi's instruction set still might be optimised a little in future versions.
    242 ### Why is there a distinction between instructions and custom-defined words? Forth doesn't have one!
    244 This distinction only exists to simplify program interpretation flow. Forth uses whitespace as an essential syntactic feature to delimit words and literals, Equi does not. Therefore, the only way to distinguish between a string literal and compiled word definition is by the means of a special instruction. And using for the compiled words the same approach as for the hexadecimal short literals (automatically try to detect one before an instruction) would be too resource-heavy for the oldest systems as it would involve computing CRC16 on the literal stack contents every single instruction. A dedicated instruction that denotes what to do with the literal stack is much more convenient and straightforward to implement.  
    246 ### Is Equi self-hosted, i.e. can it compile and run a new version of itself?
    248 Depends on what exactly you mean by this. If you mean something like [Uxntal assembler written in Uxntal](https://wiki.xxiivv.com/site/drifblim.html), the beauty of Equi is that it doesn't need such a tool, because what you type is what gets directly executed. A single-pass Equi code minifier, similar to what `equi m` does in the reference implementation, surely can be implemented in Equi itself, and a proof of this concept is under development now. With Equi being Turing-complete, a full Equi VM running inside an Equi VM is theoretically also possible, although it would be rather slow, complex to implement and bearing little to no practical use. If, however, you mean a compiler of Equi to the target's machine language, implemented in Equi itself, the amount of work required to do that would be comparable to implementing compilers on a Forth system, and would most likely hit the 64K RAM limit. But for the simplest targets this also it possible if you throw enough time and effort into this. 
    250 ### Where are labels? Macros? Includes? Why doesn't Equi have them? Even Uxntal has them!
    252 Being flexible and human-readable, but a machine language nevertheless, Equi deliberately doesn't include any features that would qualify as preprocessing and require more than a single pass when loading a program into the command buffer. The principle "one source instruction is one machine instruction" is paramount for the entire platform. One can, however, create a translator that compiles a higher level programming language into Equi, with that compiler/language having any required preprocessing features.
    254 Some features can be simulated with tools external to Equi, for instance, includes can be achieved by concatenating several files, as long as only the last of them contains the `Q` instruction, and loop labels can be emulated with saving jump addresses to the return stack and calling them back when necessary. Only whitespace and comments, which are absolutely needed in order to write readable programs directly in Equi, are being stripped during the single-pass program bootup.
    256 ### What are the minimum system requirements to implement/port and/or run Equi?
    258 For the reference implementation in ANSI C, at least 32K of RAM and 6502 or better CPU are recommended, and 64K RAM and above are ideal. For your own implementations, make sure that the CPU speed is enough to perform 16-bit integer multiplication and division, as well as CRC-16 calculation, without noticeable lags, and that the command buffer, CLT and GPD areas are large enough to fit programs for your tasks. Also, persistent storage and realtime clock facilities are nice to have as the bare minimum.
    260 ### Which CRC-16 variant is required for Equi?
    262 There is no **required** variant of CRC-16. Different implementations using different CRC-16 algos doesn't mean the programs for them would be incompatible, it's only related to the internal storage of the compiled words in CLT. The **recommended** CRC-16 variant though is the one used in the reference implementation, CRC-16-CCITT (0xFFFF). This one is simple to implement and provides a good pseudo-random distribution even for long sequences of zero bytes.
    264 ### Is non-blocking key input implemented for the targets that support it?
    266 For now, no, but it may come true in the future versions. Now, more essential features are being focused upon.
    268 ## Credits
    270 Created by Luxferre in 2022, released into public domain.
    272 Made in Ukraine.