This post describes the BASIC IF command. For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.
BASIC IF command
The PDP-11 BASIC IF command is simpler than the equivalent command in other languages such as C. The general syntax is as follows:
IF <expression> <operator> <expression> THEN <statement>
THEN <line number>
GOTO <line number>
The list of available operators is "=", "<", "<=", ">", ">=" and "<>".
Note that there is no "else if" or "else" syntax. There are also no boolean operators available, since expressions can only have mathematical results.
Below is the code for the IF command hander. Remember that these command handlers parse the remainder of the statement after the actual command word itself, so this code parses an IF statement, minus the "IF". See Part 8 for more background on BASIC command handling in general.
006140 104536 TRAP 136
006142 102775 BVS 6136
006144 104542 TRAP 142
006146 104540 TRAP 140
006150 020227 CMP R2, #76
006154 001405 BEQ 6170
006156 020227 CMP R2, #75
006162 001402 BEQ 6170
006164 005301 DEC R1
006166 105004 CLRB R4
006170 012702 MOV #6250, R2
006174 020422 CMP R4, (R2)+
006176 001404 BEQ 6210
006200 020227 CMP R2, #6264
006204 103773 BCS 6174
006206 104423 TRAP 23
006210 062702 ADD #171526 (-6242), R2
006214 006302 ASL R2
006216 062702 ADD #6264, R2
006222 010246 MOV R2, -(SP)
006224 104536 TRAP 136
006226 102743 BVS 6136
006230 010146 MOV R1, -(SP)
006232 010601 MOV SP, R1
006234 022121 CMP (R1)+, (R1)+
006236 104542 TRAP 142
006240 010600 MOV SP, R0
006242 104434 TRAP 34
006244 000176 JMP @10(SP)
006250 036076 ; this is the two characters "<>"
006252 036075 ; this is the two characters "<="
006254 036000 ; this is the character "<", plus a zero byte
006256 037075 ; this is the two characters ">="
006260 037000 ; this is the character ">", plus a zero byte
006262 036400 ; this is the character "=", plus a zero byte
006250 036076 ; this is the two characters "<>"
006256 037075 ; this is the two characters ">="
006264 001023 BNE 6334
006266 000411 BR 6312
006270 003421 BLE 6334
006272 000407 BR 6312
006274 002417 BLT 6334
006276 000405 BR 6312
006300 002015 BGE 6334
006302 000403 BR 6312
006304 003013 BGT 6334
006306 000401 BR 6312
006310 001411 BEQ 6334
006312 062706 ADD #6, SP
006316 012601 MOV (SP)+, R1
006320 062706 ADD #10, SP
006324 104502 TRAP 102
006326 005301 DEC R1
006330 000167 JMP 2762
006334 062706 ADD #6, SP
006340 012601 MOV (SP)+, R1
006342 062706 ADD #10, SP
006346 104540 TRAP 140
006350 020427 CMP R4, #52110
006354 001015 BNE 6410
006356 104540 TRAP 140
006360 020427 CMP R4, #42516
006364 001020 BNE 6426
006366 104472 TRAP 72
006370 005301 DEC R1
006372 104470 TRAP 70
006374 102403 BVS 6404
006376 001013 BNE 6426
006400 000167 JMP 4216
006404 000167 JMP 3234
006410 020427 CMP R4, #43517
006414 001004 BNE 6426
006416 104540 TRAP 140
006420 020427 CMP R4, #52117
006424 001765 BEQ 6400
006426 104425 TRAP 25
Let's see how this works.
006140 104536 TRAP 136
006142 102775 BVS 6136
006144 104542 TRAP 142
Firstly, TRAP 136 is used to parse the expression before the operator. The result will be stored as a floating point value in R2/R3/R4. If TRAP 136 sets the overflow flag, that means an additional closing bracket was identified at the end of the expression, which is an error, so control branches to 6136 to return an error. Otherwise, TRAP 142 is used to push the floating point value of the expression onto the stack.
We now move on to parsing the operator. Recall that the list of available operators is "=", "<", "<=", ">", ">=" and "<>". If you take a look at this list, you will see that all of the operators are one character in length except for "<=", ">=" and "<>". In these three cases, the second character of the operator is either "=" (for the first two cases) or ">" (for the third case).
006146 104540 TRAP 140
TRAP 140 is used to parse the next two non-whitespace characters and merge them into a single word value which is returned in R4. The two ASCII characters, A and B, would be stored in the high and low words of the returned value, respectively. R2 will also contain the second of the two characters parsed.
006150 020227 CMP R2, #76
006154 001405 BEQ 6170
006156 020227 CMP R2, #75
006162 001402 BEQ 6170
006164 005301 DEC R1
006166 105004 CLRB R4
The value in R2 is compared to ">" (ASCII 76) which will identify the "<>" case. If the value in R2 matches, control branches to 6170. Next the value in R2 is compared to "=" (ASCII 75) which will identify the "<=" and ">=" cases. Again, if the value matches control branches to 6170. Otherwise we are dealing with a single character operation, so R1 is decremented, to restore the additional character that has been parsed to the input string and the low byte of R4 is cleared.
006170 012702 MOV #6250, R2
006174 020422 CMP R4, (R2)+
006176 001404 BEQ 6210
006200 020227 CMP R2, #6264
006204 103773 BCS 6174
006206 104423 TRAP 23
Next, the value 6250 is moved into R2. At this address, and the subsequent addresses, we find the following strings:
006250 036076 ; this is the two characters "<>"
006252 036075 ; this is the two characters "<="
006254 036000 ; this is the character "<", plus a zero byte
006256 037075 ; this is the two characters ">="
006260 037000 ; this is the character ">", plus a zero byte
006262 036400 ; this is the character "=", plus a zero byte
The value in R4 is then compared against the value at the memory location in R2, after which R2 is incremented to point at the next entry in the list of operation strings. If the value in R4 matches, control branches to 6210. Otherwise, the value in R2 is compared to 6264, to see if we have reached the end of the list. If not, control branches to 6174 to compare the next entry. If we have reached the end of the list and no match is found, an error is generated.
When a match is found, the location of the corresponding test is calculated and saved:
006210 062702 ADD #171526 (-6242), R2
006214 006302 ASL R2
006216 062702 ADD #6264, R2
006222 010246 MOV R2, -(SP)
The location of the word after the matching string is stored in R2. The value 171526 (which is -6246) is added to the value in R2. Then, the value in R2 is shifted left. This is because each operation's handling code is four words in length, so the calculated locations are four words apart. Then, 6264 is added to the value in R2 and the result is stored in the stack. The operation handling code can be found below, starting at address 6264. We'll come to them in a few minutes.
006224 104536 TRAP 136
006226 102743 BVS 6136
TRAP 136 is used again to parse the expression after the operation. he result will be stored as a floating point value in R2/R3/R4. If TRAP 136 sets the overflow flag, that means an additional closing bracket was identified at the end of the expression, which is an error, so control branches to 6136 to return an error.
Now that both expressions have been evaluated, we set up the operands for the comparison:
006230 010146 MOV R1, -(SP)
006232 010601 MOV SP, R1
006234 022121 CMP (R1)+, (R1)+
006236 104542 TRAP 142
006240 010600 MOV SP, R0
R1 is pushed onto the stack, and then the stack pointer is copied into R1. R1 is then incremented twice, which means it will now point at the memory location of the first expression value. TRAP 134 is then used to copy R2/R3/R4 (the value of the second expression) onto the stack and the stack pointer is, once again, copied to R0.
The two expression values are now pointed to by R0 and R1.
006242 104434 TRAP 34
006244 000176 JMP @10(SP)
TRAP 34 is used to compare two floating point values. The status flags are set to values equivalent to the CMP operation. Control now jumps to the memory location stored at SP+10, which is the operation evaluation code:
006264 001023 BNE 6334
006266 000411 BR 6312
006270 003421 BLE 6334
006272 000407 BR 6312
006274 002417 BLT 6334
006276 000405 BR 6312
006300 002015 BGE 6334
006302 000403 BR 6312
006304 003013 BGT 6334
006306 000401 BR 6312
006310 001411 BEQ 6334
You'll see that this code consists of a series of tests, one for each operation. The tests are of the same form, if the relevant condition is met branch to 6334 to evaluate the rest of the IF statement. Otherwise, when the condition is not met, branch to 6312 to return without evaluating the THEN/GOTO part of the IF statement.
Here's the code at 6312:
006312 062706 ADD #6, SP
006316 012601 MOV (SP)+, R1
006320 062706 ADD #10, SP
006324 104502 TRAP 102
006326 005301 DEC R1
006330 000167 JMP 2762
We pop three words off the stack, then restore R1 from the stack, then pop another four words off the stack. We then use TRAP 102 to move to the end of the current command, decrement R1 by one character and then jump back to the main syntax parsing loop. These last three instructions, starting at 6324, are actually the exact same code that is used to handle REM and DATA, as described in Part 22.
If the operator test above indicated that the relevant condition was met, we continue at address 6334:
Various values, including the two floating point expression values are popped from the stack and discarded. R1 is also restored from the stack.
006346 104540 TRAP 140
006350 020427 CMP R4, #52110
006354 001015 BNE 6410
TRAP 140 is used to get the next two non-whitespace characters as a single word. The returned value, found in R4, is compared to the character pair "TH" (value 52110). If the two characters are not equal to this value, then control branches to 6410 to test for "GOTO". Otherwise:
006356 104540 TRAP 140
006360 020427 CMP R4, #42516
006364 001020 BNE 6426
TRAP 140 is used again to get the next two non-whitespace characters as a single word. This value is then compared against the character pair "EN" (value 42516). If the two characters are not equal to this value, this is an error and control branches to 6426 to return an error.
006366 104472 TRAP 72
006370 005301 DEC R1
006372 104470 TRAP 70
006374 102403 BVS 6404
006376 001013 BNE 6426
006400 000167 JMP 4216
006404 000167 JMP 3234
In the case where we successfully locate the string "THEN", this can either be followed by another BASIC statement or by a number, representing the line number to jump to.
TRAP 72 is used to get the next non-whitespace character. R1 is then decremented to re-add this character to the input string. TRAP 70 is used to test the nature of the character.
If the overflow flag is set, meaning that the character is not numeric and not an upper case letter, control branches to 6404. This is what would be expected if the THEN token is followed by a BASIC command, because all BASIC commands are parsed into lower case letters, as described in Part 6. In this case, control jumps onwards to 3234 which is part of the syntax parsing loop for parsing a statement.
If the zero flag is not set, that means that the character is non-numeric and therefore no valid option remains so control branches to 6426 to return an error. Otherwise, the zero flag was set so control jumps to 4216, which is the GOTO code, to jump to the specified line number.
006410 020427 CMP R4, #43517
006414 001004 BNE 6426
006416 104540 TRAP 140
006420 020427 CMP R4, #52117
006424 001765 BEQ 6400
006426 104425 TRAP 25
In the case that the "THEN" was not found in the IF statement, this is the code that is used to check for "GOTO". The first two characters in R4 are compared to "GO" (value 43517). If not equal, control jumps to 6426 to return an error. If these two characters are found, TRAP 140 is used to read the next two characters into a word in R4. These two characters are then compared to "TO" (value 52117). If the two characters are equal then control branches to 6400, and jumps on from there to the GOTO code, to jump to the line number specified.
Finally, if the GOTO syntax is not valid, TRAP 25 is used to return an error.
That concludes the analysis for this post. Thanks for reading!
コメント