BASIC for Arduinos

This is a long post. If you want to skip to the code it can be downloaded from the book website. There is also a page showing how to set up PuTTY as a suitable terminal emulator to run the BASIC.

Why BASIC? 

While writing a section of my Arduino C book I had a sudden urge to write a simple BASIC language interpreter that would run on an Arduino. Not for any particular reason other than it sounded like a fun challenge. I filed the idea away for a while as the book needed to be finished but found myself returning to the idea more than once.

Take a minute or three to read how Steve Wozniak described writing the first BASIC interpreter for the original Apple computer. His enthusiasm was clearly unbounded as was his understanding that at that moment in time BASIC was fundamental to the Apple computer’s commercial success.

Why BASIC today though you might still be asking? Why not JavaScript or Python which are all the rage now? Now JavaScript is a strong contender for my favourite language. I love the apparent simplicity and the way that those simplicities lead to incredibly subtle emergent properties but there is one snag. JavaScript sort of depends upon an external environment (usually a web browser) to manage user interaction. Similar (but not identical) issues apply to Python. Even the most primitive BASIC implements both INPUT and PRINT statements and it is impossible to overstate the historical importance of those capabilities in a programming language that came into existence in an era where almost all user input (data) was read from punch cards and tape drives. Admittedly, Lisp/Scheme might have been an interesting option for those with a strong interest in computational models and Forth could have been a good choice for this platform as well.

Once given some consideration, the BASIC idea can be split into two potential paths. You might consider writing a tiny BASIC that would actually be interpreted and run on an Arduino or you might consider writing a BASIC that could be used to compile a program to run on and control an Arduino. Both of those project ideas are interesting and share some challenges.

I decided to at least start with the mini BASIC actually running on an Arduino. First reason was that at least some of the problems challenges of the alternate project would be met (and maybe solved) in this first stab plus there is the bonus of needing to implement some new and interesting structures and techniques in the C code. In addition, the code will execute arbitrary commands in the BASIC language on an Arduino – and that fundamental technique should be good for executing any arbitrary command syntax on that platform.

Defining an Arduino run BASIC

One of the joys of working with a language that has not been submitted to a standard body (like C or C#) is that you can implement what parts of any pre-existing implementations you like without having to apologise. My BASIC or your BASIC is as valid as any other even if mine is less capable than some.

Early BASIC language versions were designed to be used by programmers without access to sophisticated text editing facilities. The simple solution to this technology lack was to use line numbers. Line numbers solved multiple problems. Line numbers set the order that lines should be executed while (through the GOTO and later GOSUB commands) allowing the next line due to be executed to be changed in response to program logic. The convention was to start writing code using program line numbers that incremented in 10s. This made it easy to insert additional lines as and when required. If the programmer wanted to change a line then all they needed to do was retype the same line number with the corrected code to overwrite any existing line. Similarly, entering a line number without any additional BASIC code would effectively delete a given line from the program. It seemed sensible to adopt this approach for an Arduino based BASIC interpreter.


After making the fairly obvious decision to use BASIC program line numbers I then made the arbitrary decision to break the program down into three components. The first being based upon the statutory Arduino loop() function and that would manage user input from the USB port. The second component would store basic program lines and manage the execution of program lines. The third would be the BASIC interpreter that would parse individual lines and execute them. These choices may not, of course, have been optimal but you have to start somewhere. This exercise was not going to be a master class in compiler writing (very far from it) but is intended to catch just a little of the buzz The Woz experienced creating his BASIC.

Line numbers, statements and controlling commands

Line numbers should be positive integers between 1 and 32767.  The lowest line number is used for the first program line to be executed. Execution will then continue in line order (except where explicit branching with a GOTO statement redirects) until an END statement is met or the highest line number has been executed and there are no further statements.

Program statements are restricted to 132 characters including the line number. This is arbitrary and the relevant constant can be varied.

Statements may be added to a program in any order. A statement may be replaced by re-entering it from the terminal using the same line number. A line number entered without a related statement will effectively delete any existing statement with the same line number. The program may be displayed at the terminal by entering the command LIST. The program is executed using the RUN command. Program execution will then continue until an END statement is met or the user presses the escape key (Esc).

A new program can be written after entering the NEW command which will remove any existing program from the program storage area in memory.

The language data types

Following in the footsteps of many early BASIC implementations, this version deals with variable names in a pragmatic manner. There are 26 predefined integer variables (A% to Z%) and 26 predefined string variables (A$ to Z$). While this approach can be simply extended to include floats and precludes the need for a DIM statement it does not provide support for arrays or more descriptive variable names. This issue is discussed later.

String variables are arbitrarily restricted to 132 characters in length. The range of values that can be stored in any integer variable is device dependant. String variable values may be treated as mutable. All numeric constants are converted to integers. Negative integer constants have a leading minus sign but a leading plus sign is not supported. String literals (within double quotes) are supported.

The language statement types
Statement Type
Components
Comments
INPUT
INPUT, [optional string literal or variable as a prompt], variable name (to be assigned any user input).
Prompts for user input from the terminal (emulator).
PRINT
PRINT, [An expression that can be resolved as a string and displayed at the terminal]
a PRINT statement without any following expression will just print a new line character to the terminal.
LET
LET, a variable name, “=”, a string or integer expression.
here the “=” operator is an assignment operator
IF
IF, conditional expression, THEN, BASIC statement to be executed if condition resolves as true.
THEN is followed by a valid BASIC statement. A FOR statement or another IF statement not allowed here but anything else should be fine.
FOR
FOR, integer count variable, “=”, expression for initial value, TO, expression for target value, [optional STEP, followed by expression for value to increment (or decrement) the count variable when NEXT is executed]
Together with a following NEXT statement will execute a set of statements for the number of times required to increment the integer variable to the target value. If STEP is not specified the default is +1. For/NEXT loops may be nested.
NEXT
NEXT
Increments the controlling integer of the last executed FOR loop statement by the value of STEP. Tests if FOR condition now complete and redirects program execution to the appropriate line.
GOTO
GOTO, Line Number
Redirects program execution to the specified line number.
GOSUB
GOSUB, Line Number
Redirects program execution to the specified line number. Subroutines may include another GOSUB.
RETURN
RETURN
Redirects program execution to the line after the last GOSUB
REM
REM, any text
This statement type holds an optional programmer comment and can be used as a target for a GOTO or GOSUB. Otherwise it is not actively executed.
END
END
terminates the program

The language operators
Operator Group
Operators
comments
Arithmetic
+, -, * and /
Normal arithmetic precedence rules applied and parentheses may be used
Logical
=, <, >, <=, >=, AND, OR
No precedence assumed except where parentheses are used. No short circuiting (all conditional expressions are evaluated).
Other
=, (, ), &, “
= is also an assignment operator in a LET statement.
& is a concatenation operator for string expressions or to force a string evaluation of a following integer expression. Double quotes delimit string literals.
Implicit

An integer variable or expression assigned to a string variable will be automatically converted to a string. A string assigned to an integer variable will be converted to an integer if at all possible.

User input

The normal Serial Monitor supported by the Arduino IDE is designed primarily as an output facility with occasional user input. The task of writing a BASIC program to be run on an Arduino felt like it needed better interactive support. I decided to use PuTTY running on a personal computer as a simple terminal emulator and this shaped the code within the loop() function of the Arduino sketch. The code there echoes each input character and triggers a process when the user presses the return key to complete a given entry (command, new program statement or response to a prompt from a running BASIC program).

One thing common to all BASIC implementations is that they are case insensitive. In an attempt to simplify my implementation, I decided to convert all user input to upper case except for characters within a string literal. This has an authentic “feel” even if these days we are somewhat sensitive to upper case text.

Some guidance on setting up PuTTY (available on all major PC platforms) is available on the book website Other terminal emulation software will probably work just as well with a minimum of adjustment.

Parsing BASIC statements.

One of the great design features of the BASIC language is that the primary keywords (IF, PRINT, INPUT FOR etc.) come at the beginning of each statement. This greatly simplifies the task of breaking each program statement down into its component parts. That task is normally done by reading through a line and identifying the type of each element (might be a variable name, constant or subsidiary keyword such as TO) and creating a representation of the program line as a set of tokens. Each type being represented by a specific token. The interpreter can then use the tokens as a guide on how to process each statement. The set of tokens can also be used to validate a given statement and ensure that it will probably be processed correctly.

Compiler and interpreter writers will commonly call upon the services of any available regular expression facility [https://en.wikipedia.org/wiki/Regular_expression ] to identify individual statement components ready to be “tokenised”. ANSI C does not have an inbuilt regular expression capability although there are libraries available. It turns out that tokenising a BASIC language statement is not terrifically complex and can be managed with some relatively straightforward C code. This is a good thing in many ways as the oft quoted aphorism (attributed to Jamie Zawinski [https://www.jwz.org/about.html ]) - ‘Some people when confronted with a problem, think “I know, I’ll use regular expressions.” Now they have two problems.’ more than hints a warning. That’s not strictly fair – regular expressions are great at analysing regular grammar and so fit nicely into projects such as this one. Programmers run into trouble when they try to use regular expressions more adventurously.

The syntax of a programming language is often described using BNF (Backus-Naur Form) which is a formal syntax for describing the syntax of a programming language. A language compiler (or interpreter, as in this case) will take a stream of characters (a program statement) and apply a lexical analysis to break the input down into a stream of tokens. The tokens are then sent to a parser which can apply the language rules represented in BNF. If the program statement passes muster then it can be converted to executable code ready to run on the target processor.

The interpretation and validation requirements of this BASIC are fairly simple but the code follows a similar route with a less formal language definition. As program statements are entered they are parsed and validated with feedback in the form of error messages sent to the terminal. Valid statements are stored in memory. The line numbers are stored in an ordered linked list to support program execution and the text of each statement is stored ad hoc in the heap.

When the BASIC program is running, statements are retrieved in turn from memory, parsed again (to assemble the tokens) and then executed.

The Arduino choice

I started working up the initial terminal support and statement line storage elements using an Arduino Uno. This proved a bit of a pain as I needed to keep connecting and disconnecting the terminal emulator to allow program updates from the IDE. I know, I thought, I will use my Arduino Due and “hook” the terminal emulator up to the native USB and keep the IDE connected to the programming port. This does not work. I am not at all sure why the Due has two USB connections as (certainly in this scenario) you can’t use both at once. Perhaps a future revision will fix this.

Despite the continued USB sharing issue, I carried on developing this program on a Due and made use of the greater data memory availability on that device. The final program code can be made to fit on a Uno (with a minimum of surgery) but running any meaningful BASIC program on that platform proved impossible. The ATMega chip was never intended as a general purpose processing unit and parsing and executing text lines is pretty far from what its designers had in mind. If you only have a Uno (or similar) available then you can still have a play with one of the core features of any BASIC interpreter (stack based evaluations) and the links for this can be found at the end of this post. Having said all that, you can get short BASIC programs running on a Mega 2560 although I took the precaution (when trying this) of moving the error messages into PROGMEM (which required a change to the error display function).

The code

The program makes use of three C++ classes. One class (ProgLines) manages the storage and retrieval of BASIC program statements, passing them to the larger class (Basic) for parsing and execution. The Basic class makes use of a Stack class when evaluation expressions. There are several downloadable stack classes available for Arduinos but I have cooked my own. Both the Basic and ProgLines classes declare and create their own instances as they are designed as singleton classes.
The Basic class runs to around 750 lines of code which is pushing things a wee bit within the Arduino IDE. The IDE has a limited set of features for managing large code bases and perhaps I should have looked at using an external code editor like Notepad++ or even Visual Studio. However with a lot of scrolling about the code was written.

 The .ino program tab content is the simplest to follow. The setup() function establishes a serial link to the terminal emulator and sends a user prompt (BASIC>). The loop() function echoes and adds arriving characters to a char array (translating lower case to upper case when required) until the user presses the <Return> key to complete the entry of a program line, command or response to a prompt. If the current BASIC program is in a “wait” state then the characters are assumed to be in response to a prompt from an INPUT statement and are sent to the relevant Basic.h method. Otherwise the character buffer content is passed to the ProgLines class addLine() method. If the current BASIC program is in a RUN state then the loop() function will trigger the execution of the next program statement.

The ProgLines class has a number of methods and these are outlined next.

Public methods

bool addLine(char*)
If the string array starts with a line number then any statement is passed to the Basic class for validation. If an error code is returned then this is passed to the displayError() method, otherwise the line number and statement are passed to storeLine(). If no line number then this may be a command (LIST, RUN, NEW LOAD) or will be treated as an “immediate execution” request.
void executeNext()
When a BASIC program is running this method selects the next program line from storage and passes it to the Basic class to be parsed. Assuming no error code is returned the method then calls the basic class executeLine() method. After the line is executed, the pointer to the next line to be executed is updated by calling the advanceLine() method.
bool setNextLine(short)
This method can be called from the basic class to change the next line number to be executed to the argument value.
void advanceLine()
Updates the pointer to the next line to be executed. If the last line in the program was just executed then the program status flag is changed to terminate further execution.
int getNextLine()
Fetches the next line number. This is called from the Basic class to manage loops and gosub statements.
Private methods

bool storeLine(short, char*, int)
Stores he line number (short) and line text with the final int argument passing the text length.
bool isLine(short)
Checks to see if the argument is a currently stored line number.
void listLines()
Lists the program to the serial port. Starts by initialising a buffer for the line text and then reads each line in turn from the heap and sending the line number and text to the serial port.
void newProgram()
Frees all heap memory used by any existing program ready to start storing a new one.
void runProg()
Starts the execution of a stored BASIC program from the lowest line number.
bool deleteLine(short)
Removes a specified line number and the associated code text from memory, rtesetting the line number linked list.
void displayError(int, short)
Sends the text associated with the specified error code. Error messages are retrieved via the errMess[] array.
void loadTest()
Loads a test program stored in testProg[] array. The content of this array can be varied to exercise any chosen language feature.

The integer and conditional evaluations use two stacks. This is for code clarity as the same stack could have been used for both operators and values although the stack items would have needed to differentiate the item type. There are several downloadable stack classes for the Arduino so you don’t have to “cook” your own as I have here. See the post on Stack.h here.

The Basic class:

Public Members

void userInput(char *uLine);
If the class has just executed an INPUT statement it switches status to USER. The next text entry from the serial port is then sent to this function where it will be evaluated and the result assigned to the nominated variable. GThe status is then re-set to RUN.
int parseLine(char *pLine);
Does what it says. Takes a text line and parses it to check if it is a valid BASIC code line. An error code may be returned if the parsed code fails validation.
int executeLine();
Executes the most recently parsed statement. If you read the code you will see keywords being identified and executed. Expressions are evaluated by private methods.
void setForRun();
Initialises all variables and sets the status to RUN
int rStatus;
The current class status.
enum {USER,RUN,WAIT, STOPPED};
Class status values
Selected Private Methods

int validateStatement()
The parseLine() function can detect some syntax errors but that is complemented by this function that validates all of the statement components.
byte getNextToken()
Extracts the next token from a statement and advances the parse position within a statement ready for the next call.
bool evalCondition(
Evaluates a conditionsl expression returning a bool true or false.
void evalString()
Evaluates a string expression including implicit casts from int
int evalInt()
Evaluates an int expression returning the resulting value. Manages implicit casts from string to int.

The Basic class maintains separate stacks for “for loops” and “gosubs” and this approach allows these code constructs to be nested.

Limitations

There is yet another arbitrary limit in this BASIC – allowance is only made for up to 30 tokens in any program statement. My project originally started by storing tokens in a queue maintained in the heap but the code associated with navigating the queue started to reduce readability – plus it was clear that the token list was going to be a very active structure while the program was in use. Hence a fixed array of structs.

There are several places where buffer overflows may occur – but this is a demonstration piece not production code so “mea culpa”.

So what’s missing? String manipulation – perhaps a LEFT, RIGHT and MID. An explicit conversion from string to int and vice versa (though implicit coercions work). An ELSE option for an IF statement. Floats of course (which would make numeric evaluations much more interesting and accurate) and arrays. Then there are the WHILE constructs. These are slightly tricky to implement
Do/UNTIL (or WHILE if available) would be straightforward to implement but WHILE/WEND presents problems as the while condition may be false at the point the WHILE statement is first evaluated. This would require an implicit GOTO to after the related WEND but this is an unknown location. A solution might be built by adding the statement type to the struct that stores line numbers in the linked list. A function could then search for a matching WEND (skipping over nested WHILE/WENDS) to deal with this. I think one would have to be very keen.

What would be changed in a possible second version of this program?

I think I might merge the ProgLines class into the basic class as this would make any changes to the line number linked list simpler to implement. It would also make it straightforward to use the parser (or at least the getnextToken() function) to parse any extensions of the LIST (and unimplemented RENUM) commands to allow some parameters.

I would seriously consider pushing all of the tokens onto a single stack while implementing the rules of both the arithmetic stack processing and the logical comparison stack processing. This would allow validation to be merged into that process and should result in a fairly straightforward final “stack” to be collapsed and passed to a process executing the Statement types.

Implementing meaningful variable names

One could “cheat” of course and allow the user to define (DIM is the standard statement keyword) up to 26 string variables and 26 numeric variables. A simple “look-up” table could be used to map the variable names to an array of values or pointers as already demonstrated.


No reason why the numeric variables have to be integers other than a nod back to Steve Wozniak’s efforts (Bill Gates and Paul Allen’s original version had floats as a “stand-out” feature). Implementing float values would be a straightforward substitution or could add a new data type to the BASIC. The stack demo for an Arduino Uno switches the numeric evaluation to floats.

The Code

All of the code ready for criticism, correction and enhancement is available on the book website.

Don't have a Due?

Take a look at the Uno Stack demo if you don’t have access to an Arduino with the memory capability of a Due but would like to explore parsing and tokenising statements and using a stack to evaluate an expression. 

Comments

Popular posts from this blog

Unicode output from an Arduino.

Arduino: How to scroll long text on OLED screen

Arduino Regular Expressions