daveor

Apr 6, 202116 min read

Reverse engineering PDP-11 BASIC: Part 25

This post describes the BASIC FOR and NEXT loop commands. For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.

Introduction

Loops are implemented in PDP-11 BASIC using the FOR...NEXT syntax. Any commands between the FOR and NEXT commands will be executed repeatedly until the condition specified in the FOR command is met.

Here is the general syntax:

FOR <variable> = <expression> TO <expression> [STEP <expression>]
.
.
.
NEXT <variable>

For example:

FOR X = 1 TO 10 STEP 2
. 
.
NEXT X

In this case, the commands between FOR and NEXT are executed, first with the variable X (known as the control variable) having the value 1. Then, when the NEXT command is reached, control branches back up to the FOR command and the value of X is increased by the amount specified in the STEP expression (default is to increase by 1), in this case 2, to 3. This repeats with X=5, 7 and 9. When the X=9 iteration is complete, control loops back up to the FOR command but when the STEP value is added X will have the value 11, which is greater than 10 and therefore the FOR loop ends.

As anyone who as programmed will know, a FOR loop can also be written in "while" syntax, although there is no "while" syntax in PDP-11 BASIC. The while() loop corresponding to the example above would be:

int x = 1; //initial condition
while (x <= 10) { //end condition
   .
   .
   x += 2; //step
}

In the PDP-11 BASIC code there is separate code for the FOR and NEXT commands, which will be analysed below.

BASIC FOR command

The FOR command code is quite long so I have ommitted the full, unannotated listing that I normally include. In this case, we'll just dive straight in with the analysis.

Recall that when these command handlers are executing, they parse the entire command line except for the command word itself. So, when this command handler is running for the command "FOR X = 1 to 10" this code will be processing the string "X=1 to 10". So, with that in mind, note that the first thing that needs to be parsed is the variable name.

007302 104526 TRAP 126

So, TRAP 126 is used to parse the variable name and return the variable name identifier word in R4.

007304 020227 CMP R2, #75
007310 001117 BNE 7550

The next non-whitespace character after the variable name is also returned by TRAP 126 in R2. This is compared to the equals sign (ASCII 75) and if it is not equal control branches to 7550 to return an error.

007312 005000 CLR R0
007314 010446 MOV R4, -(SP)

R0 is cleared and R4 is copied onto the stack.

007316 104534 TRAP 134
007320 104514 TRAP 114

TRAP 134 is used to set R3 to the beginning of the running state storage. TRAP 114 is used to search the running state storage for the variable identified contained in R4. If the variable is located in the running state storage, the value is pointed to by R3. If the variable is not located in the running state storage, R3 will be zero and the zero flag will be set.

007322 001003 BNE 7332
007324 010400 MOV R4, R0
007326 104546 TRAP 146
007330 000402 BR 7336

If the zero flag is not set (i.e. the variable entry is found in the running state storage) then control branches to 7332. Otherwise, in the case the variable entry is not found in the running state storage, R4, the variable identifier value, is copied into R0 and then TRAP 146 is used to allocate space for the variable with the identifier specified in R0. When TRAP 146 returns R0 will point at the location of the value storage of the newly allocated variable record. Afterwards, control branches to 7336.

007332 010300 MOV R3, R0
007334 022020 CMP (R0)+, (R0)+

In the case when the variable is located in the running state storage, R3, which points at the variable entry in the running state storage, is copied to R0. R0 is then incremented by two words - one word is the variable identifier and the other is a zero word (used to hold dimensions for a multi-dimensional variable). After these two commands, the value in R0 will point at the location of the value storage of the variable record in the running state storage.

007336 010046 MOV R0, -(SP)
007340 005000 CLR R0

Whether or not the variable originally existed, it will now have been created, and R0 will point at the value storage of the variable record in the running state storage. This value is now pushed onto the stack. R0 is then cleared.

007342 104534 TRAP 134

TRAP 134 is used to set R3 to point at the beginning of the running state storage.

007344 052704 BIS #40000, R4
007350 104514 TRAP 114
007352 001407 BEQ 7372

The constant value 40000 is AND'd to the variable identifier stored in R4. The running state storage is then searched for an entry with this identifier. This entry is used to store the information required by the NEXT command, when it is reached, to determine whether to loop back to the FOR statement and run the commands again.

TRAP 114 will return zero in R3, and set the zero flag, if an entry with the identifier in R4 is not found. If the variable identifier is not found, control branches to 7372.

If the variable identifier is found, R3 will contain the memory address of the matching runtime storage entry, in which case:

007354 010446 MOV R4, -(SP)
007356 010146 MOV R1, -(SP)
007360 012704 MOV #20, R4
007364 104520 TRAP 120
007366 012601 MOV (SP)+, R1
007370 012604 MOV (SP)+, R4

R4 and R1 are pushed onto the stack. The constant value 20 is moved into R4 and then TRAP 120 is used to delete the number of bytes specified in R4 (2o octal) from the location in the running state storage specified in R3. In other words, the variable identifier entry is deleted from the running state storage. After the deletion, R1 and R4 are restored from the stack.

Whether or not the variable identifier originally existed, it will now have been deleted and execution continues:

007372 010400 MOV R4, R0
007374 104512 TRAP 112
007376 016700 MOV 13660, R0
007402 104512 TRAP 112

The variable identifier is copied from R4 to R0 and then TRAP 112 is used to store this value into the running state storage. Then, the currently running line number is copied into R0 and TRAP 112 is used to store this value into the running state storage.

007404 104536 TRAP 136
007406 011600 MOV (SP), R0
007410 010220 MOV R2, (R0)+
007412 010320 MOV R3, (R0)+
007414 010420 MOV R4, (R0)+

TRAP 136 is used to evaluate the initial condition expression. The result will be stored as a floating point value in R2/R3/R4. The value at the top of the stack (which is the memory location in the runtime state storage where the control variable value is located) is copied into R0. R2 is copied to this memory location and then R0 is incremented. R3 is copied to the memory location in R0 and then R0 is incremented again. Finally, R4 is copied to the memory location in R0 and then R0 is incremented again. So, in summary, these lines assign the initial condition expression value to the control variable.

007416 104540 TRAP 140
007420 020427 CMP R4, #52117
007424 001051 BNE 7550

TRAP 140 is used to convert the next two non-whitespace characters from the command string into a single word, returned in R4. The value in R4 is then compared to the value 52117, which represents the two characters "TO". If the value in R4 is not equal to the expected value, control branches to 7550 to return an error.

007426 104536 TRAP 136
007430 104550 TRAP 150

TRAP 136 is used to evaluate the end condition expression and then TRAP 150 is used to save the resulting value into the running state storage. So far, the running state storage entry currently being assembled consists of:

The entry identifier, which is the the control variable identifier AND the value 40000
The line number of the FOR statement
The floating point value of the end condition expression

007432 121127 CMPB (R1), #123
007436 001013 BNE 7466

The end condition expression may or may not be followed by a STEP expression. When a STEP expression is not specified, the default STEP value is 1.

The value at the memory location specified in R1 is compared to the value "S" (ASCII 123). If not equal, in other words there is no STEP expression, control branches to 7466.

Otherwise:

007440 104540 TRAP 140
007442 020427 CMP R4, #51524
007446 001040 BNE 7550
007450 104540 TRAP 140
007452 020427 CMP R4, #42520
007456 001034 BNE 7550

TRAP 140 is used to convert the next two non-whitespace characters from the command string into a single word, returned in R4. The returned value is compared to the value 51524 (representing the two characters "ST"). If the value in R4 does not equal this two character value control branches to 7550 to return an error.

Otherwise, TRAP 140 is used again to convert the next two non-whitespace characters from the command string into a single word. The returned value is compared to the value 42520 (representing the two characters "EP"). Again, if the value in R4 does not equal this two character value control branches to 7550 to return an error.

007460 104536 TRAP 136
007462 104550 TRAP 150
007464 000406 BR 7502

If the STEP string has been successfully parsed by the code above, then the expression is evaluated using TRAP 136 and then pushed into the running state storage using TRAP 150. The running state storage entry currently being assembled now consists of:

The variable identifier, which is the the control variable identifier AND with the value 40000
The line number of the FOR statement
The floating point value of the end condition expression
The step value

Control now branches to 7502.

007466 005002 CLR R2
007470 012703 MOV #40000, R3
007474 012704 MOV #100001, R4
007500 000770 BR 7462

In cases where a STEP is not specified the following register values are loaded:

R2 has the value 000 000
R3 has the value 040 000
R4 has the value 100 001

Together these three words represent the floating point value 1. Once this floating point value has been loaded into R2/R3/R4 control branches up to 7462, described above, to store the value in the running state storage entry currently being assembled.

At this point the running state storage entry containing the loop information has been completely assembled. Note that the entry is 16 bytes (20 in octal) in length. Remember that when the entry, if found, was deleted by the instructions starting at address 007354, this was achieved by deleting 16 bytes (20 in octal) from the location identified in the running state storage.

Moving on:

007502 011600 MOV (SP), R0
007504 010146 MOV R1, -(SP)
007506 010501 MOV R5, R1
007510 162701 SUB #14, R1
007514 010146 MOV R1, -(SP)
007516 104434 TRAP 34

The value at the top of the stack, which contains the memory location of the starting expression value, is copied into R0. R1 is then pushed onto the stack and the value of R5 (containing the location of the limit of the running state storage) is copied into R1. The value 14 (decimal 12) is subtracted from R1. This will mean that R1 now points at the location of the ending expression. This value is then pushed onto the stack as well.

TRAP 34 is now used to compare the starting expression (pointed to by R0) and the ending expression (pointed to by R1). There are a number of possibilities:

If the starting expression equals the ending expression then we have nothing to do in the FOR loop so control returns to the main syntax parsing loop.
If the starting expression is less than the ending expression, we need to confirm (a) that the STEP is positive and (b) that there is a corresponding NEXT statement later in the code.
If starting expression is greater than the ending expression, we need to confirm (a) that the STEP is negative and (b) that there is a corresponding NEXT statement later in the code.

007520 001406 BEQ 7536
007522 002413 BLT 7552

So, if the result of TRAP 34 is zero, that means the starting and ending expressions are equal so we branch to 7536 to return to the main syntax parsing loop. If the result of TRAP 34 is less than zero, that means that the starting expression is less than the ending expression. Otherwise, the starting expression is greater than the ending expression, so:

007524 012601 MOV (SP)+, R1
007526 005761 TST 10(R1)
007532 002413 BLT 7562
007534 000401 BR 7540

The value at the top of the stack, which contains the location of the ending expression is popped from the stack into R1. The value at R1 + 10, which is the high order word of the floating point STEP significand value, is then tested. If it is less than zero that means the STEP value is negative, which means we will make progress towards the ending expression with each iteration, so control branches to 7562 to make sure a NEXT command can be located. Otherwise, when the STEP value is positive, that means we will never make further progress towards the ending expression and therefore we are finished so control branches to 7540 to return to the main syntax parsing loop.

007536 005726 TST (SP)+
007540 012601 MOV (SP)+, R1
007542 022626 CMP (SP)+, (SP)+
007544 000167 JMP 2762

In the case where the starting and ending expressions are equal, the location of the ending expression is popped from the stack and discarded, R1 is restored from the stack and two more values are discarded from the stack before control returns to the syntax parsing loop.

007552 012601 MOV (SP)+, R1
007554 005761 TST 10(R1)
007560 002767 BLT 7540

Finally, in the case where the starting expression is less than the ending expression, we again test the value at R1 + 10, which is the high order word of the floating point STEP significand value. If it is less than zero that means the STEP value is negative, which means we will not make progress towards the ending expression with each iteration, so control branches to 7540 to return to the main syntax parsing loop. Otherwise, the STEP value is positive and we can make further progress, so execution continues to make sure a NEXT command can be located.

007562 012601 MOV (SP)+, R1
007564 005726 TST (SP)+

To achieve this, firstly the value from the top of the stack is popped into R1. This contains the location of the end of the FOR command in the command string. Another word, the location of the starting expression value, is then popped off the stack and discarded.

007566 104534 TRAP 134

TRAP 134 is then used to position R3 at the beginning of the running state storage, which is also the end of the command string.

007570 122127 CMPB (R1)+, #157
007574 001403 BEQ 7604
007576 020103 CMP R1, R3
007600 103773 BCS 7570
007602 104457 TRAP 57

The value pointed to by R1 is compared to "o" (ASCII 157), which is the token for the NEXT command, and then R1 is incremented. If the value pointed to by R1 equals "o" that means we have located a NEXT command so we branch to 7604 to see if the variable specified in the NEXT command matches.

Otherwise we compare R1 to R3, to see if we have reached the end of the command sequence and if not we branch back to 7570 to keep searching. Otherwise TRAP 57 is used to return an error.

007604 104526 TRAP 126
007606 020416 CMP R4, (SP)
007610 001367 BNE 7570

If a NEXT command has been identified, TRAP 126 is used to get the variable identifier from the NEXT command. The resulting value, stored in R4, is compared to the value on the top of the stack, which is the variable identifier for the variable in the FOR command. If these are not equal, control branches back up to 7570 to keep searching the command string.

007612 005726 TST (SP)+
007614 005301 DEC R1
007616 000752 BR 7544

Otherwise, if we have identified the matching NEXT command, the top value off the stack (the variable identifier for the FOR command variable) is popped and discarded. R1 is decremented and then control branches to 7544 to return to the syntax parsing loop.

This concludes the analysis of the FOR command.

BASIC NEXT command

The NEXT command is the other part of the loop syntax, and it is used to identify the end of the sequence of commands that are to be executed within the FOR...NEXT loop structure. Briefly, the NEXT code decides, based on the current value of the loop variable, whether or not whether to loop back up to the FOR command or to continue by executing the next command in the command sequence.

Again, the NEXT command code is quite long so I have ommitted the full, unannotated listing so let's dive straight in with the analysis.

007620 005000 CLR R0
007622 104526 TRAP 126
007624 010446 MOV R4, -(SP)

Firstly R0 is cleared and then TRAP 126 is used to get the variable identifier from the command, which is returned in R4. The value in R4 is then pushed onto the stack.

007626 104534 TRAP 134
007630 104514 TRAP 114
007632 001451 BEQ 7756
007634 010346 MOV R3, -(SP)

TRAP 134 is used to position R3 at the beginning of the running state storage and then TRAP 114 is used to search the running state storage for the entry with the identifier specified in R4. TRAP 114 will return zero, and set the zero flag, if the entry with the identifier specified in R4 is not found. In this case that would be an error condition, so if TRAP 114 sets the zero flag, control branches to 7756 to return an error. Otherwise, R3 contains the location of the variable entry in the running state storage and this value is pushed onto the stack.

007636 052704 BIS #40000, R4
007642 104534 TRAP 134
007644 104514 TRAP 114
007646 001443 BEQ 7756

The variable identifier in R4 is then AND'd with the constant value 40000 to create the identifier for the loop condition values, described above in the analysis of the FOR command. TRAP 134 is then used to position R3 at the beginning of the runtime state storage and then TRAp 114 is used to search the runtime state for the entry with the identifier contained in R4. If the entry with the identifier specified in R4 is not found, as indicated by the zero flag being set, control branches to 7756 to return an error.

If an error does not occur, R3 will now point at the loop condition values entry in the running state storage. Recall that this entry is of the following form:

The entry identifier, which is the the control variable identifier AND with the value 40000
The line number of the FOR statement
The floating point value of the end condition expression
The step value

007650 010146 MOV R1, -(SP)
007652 022323 CMP (R3)+, (R3)+
007654 010301 MOV R3, R1
007656 062701 ADD #6, R1

The value in register R1 is pushed onto the stack. Then R3 is moved forward by two words, so that it now points at the floating point value of the end condition expression. The value in R3 is then copied into R1 and 6 is added to R1.

After these instructions, R3 points at the floating point value of the end condition expression and R1 points at the floating point value of the step expression.

007662 016600 MOV 2(SP), R0
007666 010346 MOV R3, -(SP)
007670 022020 CMP (R0)+, (R0)+
007672 010046 MOV R0, -(SP)

The value at SP+2, which contains the memory location of the control variable entry in the running state storage, is copied into R0. R3 is then pushed onto the stack. The value in R0 is incremented twice (skipping two words of the variable entry) so that R0 now points at the actual control variable value, and this value is then pushed onto the stack.

007674 104420 TRAP 20

TRAP 20 is then used to add the STEP expression value (pointed to by R1) to the control variable value (pointed to by R0) with the result being returned in the value pointed to by R0.

007676 012600 MOV (SP)+, R0
007700 011603 MOV (SP), R3
007702 010301 MOV R3, R1

Afterwards, the value of R0 (i.e. the location of the control variable value in the running state storage) is restored from the stack. Also, R3 (the value of the end condition expression) is restored from the stack, but the value is not popped off. This value is also copied into R1.

007704 005763 TST 10(R3)
007710 100003 BPL 7720

The value at R3 + 10 (octal) is tested. R3 points at the beginning of the floating point value of the end expression, which is followed in memory by the floating point value of the STEP expression. Skipping ahead by 10 from the beginning of the end expression is testing the second word of the STEP floating point value. If the STEP value is positive this value will be positive and if the STEP value is negative this value will be negative.

If the value is positive, control branches to 7720. Otherwise, when the value is negative:

007712 104434 TRAP 34
007714 003021 BGT 7760
007716 000402 BR 7724

TRAP 34 is used to compare the values in R0 (the current value of the control variable) and the value in R1 (the end expression value). If the end expression value is greater than the value of the control variable, that means we should not loop back up to the FOR statement and instead we should continue with the next command after the NEXT statement, so we branch to 7760. Otherwise we need to loop back to the FOR statement, so control branches to 7724.

007720 104434 TRAP 34
007722 002416 BLT 7760

In the case where the STEP value is positive, again, TRAP 34 is used to compare the values in R0 (the current value of the control variable) and the value in R1 (the end expression value). In this case, if the end expression is less than the control variable value, that means we should not loop back to the FOR statement, so we branch to 7760. Otherwise we need to loop back to the for statement, so we continue execution at 7724:

007724 012600 MOV (SP)+, R0
007726 005726 TST (SP)+
007730 010046 MOV R0, -(SP)

The value from the top of the stack (which contains the location of end expression value in running state storage) is popped into R0. The next value from the stack (which contains the current parse location of the command string, normally held in R1) is popped from the stack and discarded. The value in R0 is pushed onto the stack again.

007732 005740 TST -(R0)
007734 011000 MOV (R0), R0

The value in R0 is decremented and tested. R0 now points at the word containing the line number of the FOR statement. The line number of the FOR statement is then copied into R0.

007736 104474 TRAP 74
007740 104510 TRAP 110

TRAP 74 is then used to seek to the location in the code identified by the line number contained in R0. After this TRAP, R1 will point at the beginning of the command with the line number in R0. TRAP 110 is then used to seek R1 forward to the end of the FOR command, so that the next command to be executed is the command after the FOR statement.

007742 010104 MOV R1, R4
007744 012601 MOV (SP)+, R1
007746 011600 MOV (SP), R0
007750 010446 MOV R4, -(SP)

R1 is copied into R4. The location of the value of the end condition expression is popped from the stack into R1. The location of the control variable entry in the running state storage is copied from the stack into R0 and R4 is pushed onto the stack again.

007752 022020 CMP (R0)+, (R0)+
007754 000657 BR 7514
007756 104461 TRAP 61

R0 is incremented twice so it will now point at the value of the control variable value and control then branches to 7514 which will (indirectly) return to the main syntax parsing loop.

The TRAP 61 is used in the code above to return an error.

The alternative execution path is used in situations where execution should continue after the NEXT statement:

007760 012601 MOV (SP)+, R1
007762 062701 ADD #6, R1

The value of the end condition expression is popped from the stack and six is added to the value, which means that R1 will point at the step expression value.

007766 016600 MOV 2(SP), R0
007772 022020 CMP (R0)+, (R0)+

The value from SP+2, which is the location of the control variable entry in the running state storage, is copied into R0. The value in R0 is incremented twice so that now R0 points at the value of the control variable.

007774 104422 TRAP 22

TRAP 22 is used to subtract the STEP value from the control variable value.

007776 012601 MOV (SP)+, R1
010000 022626 CMP (SP)+, (SP)+
010002 000704 BR 7614

The value at the top of the stack, which is the current parse location in the command string, is popped into R1. The remainind two values are popped of the stack and discarded, before control branches at 7614 to (indirectly) branch back to the syntax parsing loop.

Summary This concludes the analysis of the FOR...NEXT loop, which is the only loop syntax available in PDP-11 BASIC.

Reverse engineering PDP-11 BASIC: Part 25

Recent Posts

Comentarios