equi

A self-descriptive stack-based PC platform
git clone git://git.luxferre.top/equi.git
Log | Files | Refs | README | LICENSE

commit f68975f5c806a15c6f6910deb913339b67cd992d
parent c0ee893f2510284fc56863825a420b8ce129ff6f
Author: Luxferre <lux@ferre>
Date:   Thu, 11 Aug 2022 11:43:05 +0300

Whitespace logic revamped, now Q at the end is mandatory but one can use any whitespace in the programs

Diffstat:
MREADME.md | 10+++++-----
Mequi.c | 38+++++++++++++++++---------------------
2 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/README.md b/README.md @@ -52,16 +52,18 @@ To preserve runtime integrity, Equi implementations are allowed to (but not requ Equi is strictly case-sensitive: all uppercase basic Latin letters, as well as a number of special characters, are reserved for machine instructions, and all custom words must be defined in lowercase only (additionally, `_` character is allowed in the identifiers). Within comments (see below), any characters can be used. +All whitespace characters (space, tabulation, CR or LF) are discarded in Equi upon loading the program and can be used for code clarity any way the author wants. + The interpreter can run in one of the four modes: command (default), interpretation (IM), compilation (CM) and instruction ignore (II) mode. An Equi machine always starts in the command mode. The latter three are triggered by certain instructions that set the corresponding flags. The semantics of the compilation mode is similar to that of Forth, and will be covered in detail here later on. -In the command mode, the interpreter doesn't perform any instruction execution and doesn't manipulate program counter (PC). Instead, it accumulates all characters typed from the standard input into the so-called command buffer. The only instruction Equi must react to in this mode is CR, the carriage return character, that sets PC to the command buffer start, sets the IM flag, **clears the CLT** and starts execution in the interpretation mode. Note that this also means that every Equi program file, even when run in a non-interactive environment, must end with a CR character, and as long as it does and every program has a halting `Q` instruction, you can safely concatenate several Equi programs in a single file to be executed sequentially. +In the command mode, the interpreter doesn't perform any instruction execution and doesn't manipulate program counter (PC). Instead, it accumulates all characters typed from the standard input into the so-called command buffer. The only instruction Equi must react to in this mode is Q, the quit instruction, that sets PC to the command buffer start, sets the IM flag, **clears the CLT** and starts execution in the interpretation mode. Note that this also means that every Equi program file, even when run in a non-interactive environment, must end with a Q character, and as long as every program has a halting `Q` instruction, you can safely concatenate several Equi programs in a single file to be executed sequentially. + +In the instruction ignore more (II flag set), all instructions or arbitrary characters except `)` (that unsets the II flag), are skipped and discarded. This can be used to write comments. In a well-formed Equi program, the characters braced in the II instructions `(` and `)`, as well as any whitespace characters, will never enter the command buffer upon loading. In the interpretation mode (IM flag set), when the interpreter encounters any of the following characters - `_0-9A-Fa-z` (not including `-`) - it pushes their ASCII values bytewise onto the literal stack (32-byte long). When any other character (except `:`, `"` or `'`) is encountered when the literal stack is not empty, the `#` instruction logic (see below) is performed automatically. If `:` is encountered, compilation mode logic is performed instead. If a `Q` instruction or a on-printable character is encountered, Equi returns to the command mode immediately. In the compilation mode, all instructions except `;` are skipped while the CM flag is set. When the interpreter encounters `;` instruction, it performs the finalizing logic to save the compiled word into CLT (see below) and returns to the interpretation mode. -In the instruction ignore more (II flag set), all instruction except `)` (that unsets the II flag), are skipped and discarded. PC, however, does increase as usual in this mode. - Note that II flag has the precedence over IM and CM flags and CM flag has the precedence over IM flag. I.e. you cannot exit the interpretation mode while being in the compilation mode, and you can't exit any other mode while being in the II mode. And surely enough you can't exit the command mode (interpreter shell itself) unless all three mode flags are unset. Equi's core instruction set is: @@ -70,13 +72,11 @@ Op |Stack state |Meaning ---|--------------------------------|---------------------------------------------------------- `#`|`( -- )` |Literal: pop all characters from the literal stack, discard all `_a-z` characters, leave the top 4 characters (replacing the missing ones with 0) and push the 16-bit value from them (in the order they were pushed) onto the main stack `"`|`( -- lit1 lit2 ... )` |Pop all the values from the literal stack and push them onto the main stack as 16-bit values -` `|`( -- )` |No operation: whitespace can be used in the code for clarity and not affect anything except PC `(`|`( -- )` |Set the II flag: when it is set, the interpreter must ignore all instructions except `)`, used for writing comments `)`|`( -- )` |Unset the II flag, returning to the normal interpretation or compilation mode `:`|`( -- )` |Compilation mode start: set CM flag and set CBP to PC+1 value `;`|`( -- )` |Compilation mode end: replace this instruction in-memory with `R` instruction, pop all characters from the literal stack, append the lookup table with their CRC16 hash and CBP value, unset the CM flag and increment CLTP value `'`|`( -- )` |Call the compiled word: pop all characters from the literal stack, compute their CRC16 hash, look it up in CLT for a CBP value, set PC to CBP if found, error out if not, then push PC to return stack and set PC to the CBP value -CR |`( -- )` |In the command mode, output a line break and switch to the interpretation mode (see above); in all other modes, identical to whitespace `R`|`( -- )` |**R**eturn: pop and assign the PC value from the return stack `]`|`( a -- )` |Pop the value from main stack and push onto return stack `[`|`( -- a )` |Pop the value from return stack and push onto main stack diff --git a/equi.c b/equi.c @@ -337,9 +337,6 @@ void equi_main_loop() { if(ram.lsp > 0 && instr != INS_LITSTR && instr != INS_LITCALL && instr != INS_LITINT && instr != INS_CMSTART) pushLitVal(); switch(instr) { /* then perform all main interpretation logic */ - case INS_IISTART: /* instruction ignore start */ - ram.II = 1; /* raise II flag */ - break; case INS_CMSTART: /* compilation start */ ram.cbp = ram.pc + 1U; /* save CBP */ ram.CM = 1U; /* raise CM flag */ @@ -540,49 +537,48 @@ int main(int argc, char* argv[]) { /* Start both execution and input buffering from the start of command buffer (-1 because we use prefix increment) */ ram.pc = ram.ibp = 65535U; - printf("Welcome to Equi v" EQUI_VER " by Luxferre, 2022\n\nCLT: 0x%04X (%d bytes)\nGPD: 0x%04X (%d bytes)\nCommand buffer: 0x%04X (%d bytes)\nEqui ready\n\n", + printf("Welcome to Equi v" EQUI_VER " by Luxferre, 2022\n\nCLT: 0x%04X (%d bytes)\nGPD: 0x%04X (%d bytes)\nCommand buffer: 0x%04X (%d bytes)\nEqui ready\n\n> ", (unsigned int) ((uchar *)&ram.clt - (uchar *)&ram.main_stack), (unsigned int) ((uchar *)&ram.gpd - (uchar *)&ram.clt), ram.gpd_start, ram.cmd_start - ram.gpd_start, ram.cmd_start, ram.cmd_size); - cputc('>'); - cputc(' '); - while(1) { /* Now, we're in the command mode loop */ - instr = ram.cmdbuf[++ram.ibp] = cgetc(); /* Fetch the next instruction from the keyboard */ - if(instr == 0xFFU || instr == 0U) /* exit */ + instr = cgetc(); /* Fetch the next instruction from the keyboard/stdin */ + if(instr == 0xFFU || instr == 0U) /* exit on zero byte */ break; - else if(instr == BS && ram.ibp > ram.cmd_start) { /* process the backspace */ + else if(instr == BS || instr == CR || instr == LF || instr == ' ' || instr == '\t') { /* ignore the backspace or whitespaces */ #ifdef __CC65__ cputc(instr); /* echo it */ #endif - --ram.ibp; - } else if(instr == INS_IISTART) { /* process II just to avoid quitting in command mode */ + } else if(instr == INS_IISTART) { /* process II start */ #ifdef __CC65__ cputc(instr); /* echo it */ #endif ram.II = 1; - } else if(instr == INS_IIEND) { /* process II just to avoid quitting in command mode */ + } else if(instr == INS_IIEND) { /* process II end */ #ifdef __CC65__ cputc(instr); /* echo it */ #endif ram.II = 0; - } else if(instr == CR || instr == LF) { /* process carriage return or linefeed */ - cputc(CR); /* echo it */ - cputc(LF); /* echo it */ - ram.cmdbuf[ram.ibp] = 0; /* replace itself with 0 */ + } else if(!ram.II && (instr == 0xFFU || instr == INS_QUIT)) { /* if not in II mode, process EOF or Q instruction: trigger interpreter loop */ + cputc(CR); /* echo CR */ + cputc(LF); /* echo LF */ + ram.cmdbuf[+ram.ibp] = 0; /* end program with 0 */ ram.IM = 1; /* set the mandatory interpretation mode flag */ equi_main_loop(); /* and run the interpreter loop */ - cputc(CR); /* echo it */ - cputc(LF); /* echo it */ + cputc(CR); /* echo CR */ + cputc(LF); /* echo LF */ cputc('>'); cputc(' '); - } + } else { /* append the instruction/character to the command buffer if and only if it doesn't match the above criteria and we're not in II mode */ + if(!ram.II) + ram.cmdbuf[++ram.ibp] = instr; #ifdef __CC65__ - else cputc(instr); /* echo it */ + cputc(instr); /* echo it */ #endif + } } /* command mode loop end */ return 0; }