BASIC for Arduinos
This is a long post. If you want to skip to the code it can be downloaded from the book website. There is also a page showing how to set up PuTTY as a suitable terminal emulator to run the BASIC.
Why BASIC?
While writing a section of my Arduino C book I had a sudden
urge to write a simple BASIC language interpreter that would run on an Arduino.
Not for any particular reason other than it sounded like a fun challenge. I
filed the idea away for a while as the book needed to be finished but found
myself returning to the idea more than once.
Take a minute or three to read how Steve Wozniak described writing the first BASIC interpreter for the
original Apple computer. His enthusiasm was clearly unbounded as was his
understanding that at that moment in time BASIC was fundamental to the Apple computer’s
commercial success.
Why BASIC today though you might still be asking? Why not
JavaScript or Python which are all the rage now? Now JavaScript is a strong
contender for my favourite language. I love the apparent simplicity and the way
that those simplicities lead to incredibly subtle emergent properties but there
is one snag. JavaScript sort of depends upon an external environment (usually a
web browser) to manage user interaction. Similar (but not identical) issues
apply to Python. Even the most primitive BASIC implements both INPUT and PRINT
statements and it is impossible to overstate the historical importance of those
capabilities in a programming language that came into existence in an era where
almost all user input (data) was read from punch cards and tape drives. Admittedly,
Lisp/Scheme might have been an interesting option for those with a strong
interest in computational models and Forth could have been a good choice for this
platform as well.
Once given some consideration, the BASIC idea can be split
into two potential paths. You might consider writing a tiny BASIC that would
actually be interpreted and run on an Arduino or you might consider writing a
BASIC that could be used to compile a program to run on and control an Arduino.
Both of those project ideas are interesting and share some challenges.
I decided to at least start with the mini BASIC actually
running on an Arduino. First reason was that at least some of the problems
challenges of the alternate project would be met (and maybe solved) in this
first stab plus there is the bonus of needing to implement some new and
interesting structures and techniques in the C code. In addition, the code will
execute arbitrary commands in the BASIC language on an Arduino – and that fundamental
technique should be good for executing any arbitrary command syntax on that
platform.
Defining an Arduino run BASIC
One of the joys of working with a language that has not been
submitted to a standard body (like C or C#) is that you can implement what
parts of any pre-existing implementations you like without having to apologise.
My BASIC or your BASIC is as valid as any other even if mine is less capable than
some.
Early BASIC language versions were designed to be used by
programmers without access to sophisticated text editing facilities. The simple
solution to this technology lack was to use line numbers. Line numbers solved
multiple problems. Line numbers set the order that lines should be executed
while (through the GOTO and later GOSUB commands) allowing the next line due to
be executed to be changed in response to program logic. The convention was to
start writing code using program line numbers that incremented in 10s. This
made it easy to insert additional lines as and when required. If the programmer
wanted to change a line then all they needed to do was retype the same line
number with the corrected code to overwrite any existing line. Similarly,
entering a line number without any additional BASIC code would effectively
delete a given line from the program. It seemed sensible to adopt this approach
for an Arduino based BASIC interpreter.
After making the fairly obvious decision to use BASIC
program line numbers I then made the arbitrary decision to break the program
down into three components. The first being based upon the statutory Arduino
loop() function and that would manage user input from the USB port. The second
component would store basic program lines and manage the execution of program
lines. The third would be the BASIC interpreter that would parse individual
lines and execute them. These choices may not, of course, have been optimal but
you have to start somewhere. This exercise was not going to be a master class
in compiler writing (very far from it) but is intended to catch just a little
of the buzz The Woz experienced creating his BASIC.
Line numbers, statements and controlling commands
Line numbers should be positive integers between 1 and
32767. The lowest line number is used
for the first program line to be executed. Execution will then continue in line
order (except where explicit branching with a GOTO statement redirects) until
an END statement is met or the highest line number has been executed and there
are no further statements.
Program statements are restricted to 132 characters
including the line number. This is arbitrary and the relevant constant can be
varied.
Statements may be added to a program in any order. A
statement may be replaced by re-entering it from the terminal using the same
line number. A line number entered without a related statement will effectively
delete any existing statement with the same line number. The program may be
displayed at the terminal by entering the command LIST. The program is executed
using the RUN command. Program execution will then continue until an END
statement is met or the user presses the escape key (Esc).
A new program can be written after entering the NEW command
which will remove any existing program from the program storage area in memory.
The language data types
Following in the footsteps of many early BASIC
implementations, this version deals with variable names in a pragmatic manner.
There are 26 predefined integer variables (A% to Z%) and 26 predefined string
variables (A$ to Z$). While this approach can be simply extended to include
floats and precludes the need for a DIM statement it does not provide support
for arrays or more descriptive variable names. This issue is discussed later.
String variables are arbitrarily restricted to 132
characters in length. The range of values that can be stored in any integer
variable is device dependant. String variable values may be treated as mutable.
All numeric constants are converted to integers. Negative integer constants
have a leading minus sign but a leading plus sign is not supported. String
literals (within double quotes) are supported.
The language statement types
Statement Type
|
Components
|
Comments
|
INPUT
|
INPUT, [optional string literal or variable as a prompt], variable
name (to be assigned any user input).
|
Prompts for user input from the terminal (emulator).
|
PRINT
|
PRINT, [An expression that can be resolved as a string and displayed
at the terminal]
|
a PRINT statement without any following expression will just print a
new line character to the terminal.
|
LET
|
LET, a variable name, “=”, a string or integer expression.
|
here the “=” operator is an assignment operator
|
IF
|
IF, conditional expression, THEN, BASIC statement to be executed if
condition resolves as true.
|
THEN is followed by a valid BASIC statement. A FOR statement or another
IF statement not allowed here but anything else should be fine.
|
FOR
|
FOR, integer count variable, “=”, expression for initial value, TO,
expression for target value, [optional STEP, followed by expression for value
to increment (or decrement) the count variable when NEXT is executed]
|
Together with a following NEXT statement will execute a set of
statements for the number of times required to increment the integer variable
to the target value. If STEP is not specified the default is +1. For/NEXT
loops may be nested.
|
NEXT
|
NEXT
|
Increments the controlling integer of the last executed FOR loop
statement by the value of STEP. Tests if FOR condition now complete and
redirects program execution to the appropriate line.
|
GOTO
|
GOTO, Line Number
|
Redirects program execution to the specified line number.
|
GOSUB
|
GOSUB, Line Number
|
Redirects program execution to the specified line number. Subroutines
may include another GOSUB.
|
RETURN
|
RETURN
|
Redirects program execution to the line after the last GOSUB
|
REM
|
REM, any text
|
This statement type holds an optional programmer comment and can be
used as a target for a GOTO or GOSUB. Otherwise it is not actively executed.
|
END
|
END
|
terminates the program
|
The language operators
Operator Group
|
Operators
|
comments
|
Arithmetic
|
+, -, * and /
|
Normal arithmetic precedence rules applied and parentheses may be
used
|
Logical
|
=, <, >, <=, >=, AND, OR
|
No precedence assumed except where parentheses are used. No short
circuiting (all conditional expressions are evaluated).
|
Other
|
=, (, ), &, “
|
= is also an assignment operator in a LET statement.
& is a concatenation operator for string expressions or to force
a string evaluation of a following integer expression. Double quotes delimit
string literals.
|
Implicit
|
An integer variable or expression assigned to a string variable will
be automatically converted to a string. A string assigned to an integer
variable will be converted to an integer if at all possible.
|
User input
The normal Serial Monitor supported by the Arduino IDE is
designed primarily as an output facility with occasional user input. The task
of writing a BASIC program to be run on an Arduino felt like it needed better
interactive support. I decided to use PuTTY running on a personal computer as a
simple terminal emulator and this shaped the code within the loop() function of
the Arduino sketch. The code there echoes each input character and triggers a
process when the user presses the return key to complete a given entry
(command, new program statement or response to a prompt from a running BASIC
program).
One thing common to all BASIC implementations is that they
are case insensitive. In an attempt to simplify my implementation, I decided to
convert all user input to upper case except for characters within a string
literal. This has an authentic “feel” even if these days we are somewhat
sensitive to upper case text.
Some guidance on setting up PuTTY (available on all major PC
platforms) is available on the book website Other terminal emulation software will probably work
just as well with a minimum of adjustment.
Parsing BASIC statements.
One of the great design features of the BASIC language is
that the primary keywords (IF, PRINT, INPUT FOR etc.) come at the beginning of
each statement. This greatly simplifies the task of breaking each program
statement down into its component parts. That task is normally done by reading
through a line and identifying the type of each element (might be a variable
name, constant or subsidiary keyword such as TO) and creating a representation
of the program line as a set of tokens. Each type being represented by a
specific token. The interpreter can then use the tokens as a guide on how to
process each statement. The set of tokens can also be used to validate a given
statement and ensure that it will probably be processed correctly.
Compiler and interpreter writers will commonly call upon the
services of any available regular expression facility [https://en.wikipedia.org/wiki/Regular_expression
] to identify individual statement components ready to be “tokenised”. ANSI C
does not have an inbuilt regular expression capability although there are
libraries available. It turns out that tokenising a BASIC language statement is
not terrifically complex and can be managed with some relatively
straightforward C code. This is a good thing in many ways as the oft quoted
aphorism (attributed to Jamie Zawinski [https://www.jwz.org/about.html ]) - ‘Some people when confronted with a problem,
think “I know, I’ll use regular expressions.” Now they have two problems.’
more than hints a warning. That’s not strictly fair – regular expressions are
great at analysing regular grammar and so fit nicely into projects such as this
one. Programmers run into trouble when they try to use regular expressions more
adventurously.
The syntax of a programming language is often described
using BNF (Backus-Naur Form) which is a formal syntax for describing the syntax
of a programming language. A language compiler (or interpreter, as in this
case) will take a stream of characters (a program statement) and apply a
lexical analysis to break the input down into a stream of tokens. The tokens
are then sent to a parser which can apply the language rules represented in
BNF. If the program statement passes muster then it can be converted to
executable code ready to run on the target processor.
The interpretation and validation requirements of this BASIC
are fairly simple but the code follows a similar route with a less formal
language definition. As program statements are entered they are parsed and
validated with feedback in the form of error messages sent to the terminal.
Valid statements are stored in memory. The line numbers are stored in an
ordered linked list to support program execution and the text of each statement
is stored ad hoc in the heap.
When the BASIC program is running, statements are retrieved in
turn from memory, parsed again (to assemble the tokens) and then executed.
The Arduino choice
I started working up the initial terminal support and
statement line storage elements using an Arduino Uno. This proved a bit of a
pain as I needed to keep connecting and disconnecting the terminal emulator to
allow program updates from the IDE. I know, I thought, I will use my Arduino Due
and “hook” the terminal emulator up to the native USB and keep the IDE connected
to the programming port. This does not work. I am not at all sure why the Due
has two USB connections as (certainly in this scenario) you can’t use both at
once. Perhaps a future revision will fix this.
Despite the continued USB sharing issue, I carried on
developing this program on a Due and made use of the greater data memory
availability on that device. The final program code can be made to fit on a Uno
(with a minimum of surgery) but running any meaningful BASIC program on that
platform proved impossible. The ATMega chip was never intended as a general
purpose processing unit and parsing and executing text lines is pretty far from
what its designers had in mind. If you only have a Uno (or similar) available
then you can still have a play with one of the core features of any BASIC
interpreter (stack based evaluations) and the links for this can be found at the end of this post. Having said all that, you can get short BASIC programs running on a Mega 2560
although I took the precaution (when trying this) of moving the error messages into PROGMEM (which
required a change to the error display function).
The code
The program makes use of three C++ classes. One class
(ProgLines) manages the storage and retrieval of BASIC program statements,
passing them to the larger class (Basic) for parsing and execution. The Basic
class makes use of a Stack class when evaluation expressions. There are several
downloadable stack classes available for Arduinos but I have cooked my own.
Both the Basic and ProgLines classes declare and create their own instances as
they are designed as singleton classes.
The Basic class runs to around 750 lines of code which is
pushing things a wee bit within the Arduino IDE. The IDE has a limited set of
features for managing large code bases and perhaps I should have looked at
using an external code editor like Notepad++ or even Visual Studio. However
with a lot of scrolling about the code was written.
The .ino program tab
content is the simplest to follow. The setup() function establishes a serial
link to the terminal emulator and sends a user prompt (BASIC>). The loop()
function echoes and adds arriving characters to a char array (translating lower
case to upper case when required) until the user presses the <Return> key
to complete the entry of a program line, command or response to a prompt. If
the current BASIC program is in a “wait” state then the characters are assumed
to be in response to a prompt from an INPUT statement and are sent to the
relevant Basic.h method. Otherwise the character buffer content is passed to
the ProgLines class addLine() method. If the current BASIC program is in a RUN
state then the loop() function will trigger the execution of the next program
statement.
The ProgLines class has a number of methods and these are
outlined next.
Public methods
|
|
bool addLine(char*)
|
If the string array starts with a line number then any statement is
passed to the Basic class for validation. If an error code is returned then
this is passed to the displayError() method, otherwise the line number and
statement are passed to storeLine(). If no line number then this may be a
command (LIST, RUN, NEW LOAD) or will be treated as an “immediate execution”
request.
|
void executeNext()
|
When a BASIC program is running this method selects the next program
line from storage and passes it to the Basic class to be parsed. Assuming no
error code is returned the method then calls the basic class executeLine()
method. After the line is executed, the pointer to the next line to be
executed is updated by calling the advanceLine() method.
|
bool setNextLine(short)
|
This method can be called from the basic class to change the next
line number to be executed to the argument value.
|
void advanceLine()
|
Updates the pointer to the next line to be executed. If the last line
in the program was just executed then the program status flag is changed to
terminate further execution.
|
int getNextLine()
|
Fetches the next line number. This is called from the Basic class to manage
loops and gosub statements.
|
Private methods
|
|
bool storeLine(short, char*, int)
|
Stores he line number (short) and line text with the final int
argument passing the text length.
|
bool isLine(short)
|
Checks to see if the argument is a currently stored line number.
|
void listLines()
|
Lists the program to the serial port. Starts by initialising a buffer
for the line text and then reads each line in turn from the heap and sending
the line number and text to the serial port.
|
void newProgram()
|
Frees all heap memory used by any existing program ready to start
storing a new one.
|
void runProg()
|
Starts the execution of a stored BASIC program from the lowest line
number.
|
bool deleteLine(short)
|
Removes a specified line number and the associated code text from memory,
rtesetting the line number linked list.
|
void displayError(int, short)
|
Sends the text associated with the specified error code. Error
messages are retrieved via the errMess[] array.
|
void loadTest()
|
Loads a test program stored in testProg[] array. The content of this
array can be varied to exercise any chosen language feature.
|
The integer and conditional evaluations use two stacks. This
is for code clarity as the same stack could have been used for both operators
and values although the stack items would have needed to differentiate the item
type. There are several downloadable stack classes for the Arduino so you don’t
have to “cook” your own as I have here. See the post on Stack.h here.
The Basic class:
Public Members
|
|
void userInput(char *uLine);
|
If the class has just executed an INPUT statement it switches status
to USER. The next text entry from the serial port is then sent to this
function where it will be evaluated and the result assigned to the nominated
variable. GThe status is then re-set to RUN.
|
int parseLine(char *pLine);
|
Does what it says. Takes a text line and parses it to check if it is
a valid BASIC code line. An error code may be returned if the parsed code
fails validation.
|
int executeLine();
|
Executes the most recently parsed statement. If you read the code you
will see keywords being identified and executed. Expressions are evaluated by
private methods.
|
void setForRun();
|
Initialises all variables and sets the status to RUN
|
int rStatus;
|
The current class status.
|
enum {USER,RUN,WAIT, STOPPED};
|
Class status values
|
Selected Private Methods
|
|
int validateStatement()
|
The parseLine() function can detect some syntax errors but that is
complemented by this function that validates all of the statement components.
|
byte getNextToken()
|
Extracts the next token from a statement and advances the parse
position within a statement ready for the next call.
|
bool evalCondition(
|
Evaluates a conditionsl expression returning a bool true or false.
|
void evalString()
|
Evaluates a string expression including implicit casts from int
|
int evalInt()
|
Evaluates an int expression returning the resulting value. Manages
implicit casts from string to int.
|
The Basic class maintains separate stacks for “for loops”
and “gosubs” and this approach allows these code constructs to be nested.
Limitations
There is yet another arbitrary limit in this BASIC –
allowance is only made for up to 30 tokens in any program statement. My project
originally started by storing tokens in a queue maintained in the heap but the
code associated with navigating the queue started to reduce readability – plus
it was clear that the token list was going to be a very active structure while
the program was in use. Hence a fixed array of structs.
There are several places where buffer overflows may occur –
but this is a demonstration piece not production code so “mea culpa”.
So what’s missing? String manipulation – perhaps a LEFT,
RIGHT and MID. An explicit conversion from string to int and vice versa (though
implicit coercions work). An ELSE option for an IF statement. Floats of course
(which would make numeric evaluations much more interesting and accurate) and
arrays. Then there are the WHILE constructs. These are slightly tricky to
implement
Do/UNTIL (or WHILE if available) would be straightforward to
implement but WHILE/WEND presents problems as the while condition may be false
at the point the WHILE statement is first evaluated. This would require an
implicit GOTO to after the related WEND but this is an unknown location. A
solution might be built by adding the statement type to the struct that stores
line numbers in the linked list. A function could then search for a matching
WEND (skipping over nested WHILE/WENDS) to deal with this. I think one would
have to be very keen.
What would be changed in a possible second version of this program?
I think I might merge the ProgLines class into the basic
class as this would make any changes to the line number linked list simpler to
implement. It would also make it straightforward to use the parser (or at least
the getnextToken() function) to parse any extensions of the LIST (and
unimplemented RENUM) commands to allow some parameters.
I would seriously consider pushing all of the tokens onto a
single stack while implementing the rules of both the arithmetic stack
processing and the logical comparison stack processing. This would allow
validation to be merged into that process and should result in a fairly
straightforward final “stack” to be collapsed and passed to a process executing
the Statement types.
Implementing meaningful variable names
One could “cheat” of course and allow the user to define
(DIM is the standard statement keyword) up to 26 string variables and 26
numeric variables. A simple “look-up” table could be used to map the variable
names to an array of values or pointers as already demonstrated.
No reason why the numeric variables have to be integers other than a nod back to Steve Wozniak’s efforts (Bill Gates and Paul Allen’s
original version had floats as a “stand-out” feature). Implementing float
values would be a straightforward substitution or could add a new data type to
the BASIC. The stack demo for an Arduino Uno switches the numeric
evaluation to floats.
The Code
All of the code ready for criticism, correction and enhancement is available on the book website.
Don't have a Due?
Take a look at the Uno Stack demo if you don’t have access to an Arduino with the memory capability of a Due but would like to explore parsing and tokenising statements and using a stack to evaluate an expression.
Don't have a Due?
Take a look at the Uno Stack demo if you don’t have access to an Arduino with the memory capability of a Due but would like to explore parsing and tokenising statements and using a stack to evaluate an expression.
Comments
Post a Comment