top of page
Search
daveor

Reverse engineering PDP-11 BASIC: Part 5

In this post I'll be looking at the displaying and handling of the long-form menu options, the memory and tape reader configuration, deleting the extended functions, loading the EXF function and jumping to display "READY".


NOTE: For context and a list of other posts on this topic, see the PDP-11 BASIC reverse engineering project page.


Reminder: What happens when BASIC starts running?

The PDP-11 BASIC Programming Manual (chapter 7) provides an explanation, which I will summarise here. When you load BASIC, the first thing you will see is this:

PDP-11 BASIC, VERSION 007A
*O 

The "*O" is supposed to represent a request for options, and you can enter one or more of the following options as a comma separated list:

  1. "L" : Specifying "L" tells BASIC to use the low-speed reader/punch for SAVE/OLD commands instead of the high-speed reader/punch. If there is no high-speed reader/punch, the low-speed reader/punch will be authomatically selected.

  2. "D" : Specifying "D" tells BASIC to delete the extended functions (SIN, COS, ATN, and SQR), presumably to free some space. Default is to retain these functions.

  3. "E" : Specifying "E" tells BASIC to delete the "EXP" and "LOG" functions as well as the extended functions listed above. Default is to retain these functions.

  4. "H" : Specifying "H" tells BASIC to halt before entering the interpreter to allow loading of the EXF function from paper tape. The EXF function allows BASIC programmers to invoke other code written in assembly language from within a BASIC program. Default is not to halt.

  5. Any number between 4 and 28: By default BASIC will use all memory available to the processor. Specifying a number between 4 and 28 tells BASIC to only use that number of kilobytes of memory.

  6. "?": Specifying a "?" tells BASIC to display the long-form version of the configuration options.

  7. Just press RETURN: In this case the default values are all selected, you will get the "READY" prompt, and can start programming BASIC.

If you enter "?" you get a long-form version of the configuration options, that looks like this:

DO YOU NEED THE EXTENDED FUNCTIONS?
HIGH-SPEED READER/PUNCH?
SET UP THE EXTERNAL FUNCTION?
MEMORY?

For the first three questions you can answer "Y" or "N" and for the "MEMORY?" question you enter a value from 4 to 28.


When all options have been selected you get the "READY" prompt and can start programming BASIC.


Displaying the long-form menu options

In Part 4 I discussed the operation of the short-form menu options, so in this post I'll look at the long-form version. Here's the code that displays the long-form menu:

016312 005067 CLR 17450
016316 005067 CLR 17452
016322 005067 CLR 17454
016326 005067 CLR 17456
016332 005067 CLR 17460
016336 012700 MOV #17304, R0
016342 104552 TRAP 152
016344 010367 MOV R3, 17454
016350 003405 BLE 16364
016352 012700 MOV #17234, R0
016356 104552 TRAP 152
016360 010367 MOV R3, 17456
016364 005767 TST 17464
016370 001005 BNE 16404
016372 012700 MOV #17350, R0
016376 104552 TRAP 152
016400 010367 MOV R3, 17452
016404 012700 MOV #17401, R0
016410 104552 TRAP 152
016412 010367 MOV R3, 17460
016416 012700 MOV #17437, R0
016422 104466 TRAP 66
016424 104500 TRAP 100
016426 104410 TRAP 10
016430 010067 MOV R0, 17450

Let's take a look at how this works.

016312 005067 CLR 17450
016316 005067 CLR 17452
016322 005067 CLR 17454
016326 005067 CLR 17456
016332 005067 CLR 17460

Firstly, the memory locations used to store the options are all cleared.

016336 012700 MOV #17304, R0
016342 104552 TRAP 152

Now, the first question is asked. 17304 is the memory address of the start of the string "DO YOU NEED THE EXTENDED FUNCTIONS?". This location is moved into address R0.


Then, TRAP 152 (described in Part 2) is used to display the string and accept user input. TRAP 152 will display the string pointed to by the memory address in R0 and then wait for input. The first non-whitespace character of the user input is stored in R2. In addition, the resulting input is processed as follows:

  1. If the user enters "Y" (or any string starting with "Y") R3 will have the value -1.

  2. If the user enters "N" (or any string starting with "N") R3 will have the value 1.

  3. Otherwise, R3 will have the value 0.

016344 010367 MOV R3, 17454
016350 003405 BLE 16364

The value from R3 is moved into memory location 17454. If the value is less than or equal to zero (i.e. "Y" was pressed), control jumps to 16364, which skips the following question.

016352 012700 MOV #17234, R0
016356 104552 TRAP 152
016360 010367 MOV R3, 17456

The next instruction moves the value 17234 into R0. This is the pointer to the string "DO YOU REQUIRE EXP OR LOG (FLOATING ^)?". Clearly, if the user indicated in the first question that they didn't require extended functions, then you don't need to ask this question, which is why it is skipped in case the user entered "Y" to the first question.


Anyway, then TRAP 152 is used to display the question string and read the response from the user. The resulting value (-1 for yes, 1 for no, 0 otherwise) is stored in address 17456.

016364 005767 TST 17464
016370 001005 BNE 16404
016372 012700 MOV #17350, R0
016376 104552 TRAP 152
016400 010367 MOV R3, 17452

This next question relates to the high-speed reader/punch. You may recall that in the initial setup code, a test is performed to try and access the high-speed reader/punch. If this test fails, the value at memory address 17464 will be non-zero. Therefore, before asking the user if they want to use the high-speed reader/punch, the value 17464 is checked and if it is non-zero, the question is skipped.


Otherwise address 17350, which is the location of the string "HIGH-SPEED READER/PUNCH?" is loaded into address R0. TRAP 152 is again used to display the question and read the user's response. The user's response (-1 for yes, 1 for no, 0 otherwise) is stored from R3 into address 17452.

016404 012700 MOV #17401, R0
016410 104552 TRAP 152
016412 010367 MOV R3, 17460

The next question follows the same familiar pattern. The memory address of the question string (in this case "SET UP THE EXTERNAL FUNCTION?") is loaded into R0. Then, TRAP 152 is used to display the string and read the user input. The result is stored in memory location 17460.

016416 012700 MOV #17437, R0
016422 104466 TRAP 66
016424 104500 TRAP 100
016426 104410 TRAP 10
016430 010067 MOV R0, 17450

Final question relates to the memory usage and this one is handled slightly differently, because it's not a yes/no question. Firstly the memory address of the question string ("MEMORY?") is loaded into R0. Then TRAP 66 is used to display the string.


Next, TRAP 100 is used to read a string from the user and store it at the location specified in R1. Finally, TRAP 10 is used to convert the string representation of a numeric value at R1 into a number, stored in R0. The value in R0 is then stored into memory address 17450.


That's the end of the long-form menu code.


I/O device selection

Now that the user has chosen whatever options they way, it is time to configure the code using those options. Firstly, this is the code that configures the high or low speed reader/punch option:

016434 066767 ADD 17464, 17452
016442 005767 TST 17452
016446 003407 BLE 16466
016450 012767 MOV #177560, 13704
016456 012767 MOV #177564, 13706
016464 000406 BR 16502
016466 012767 MOV #177550, 13704
016474 012767 MOV #177554, 13706

Let's see how it works.

016434 066767 ADD 17464, 17452
016442 005767 TST 17452
016446 003407 BLE 16466

Firstly the value in address 17464 is added to the content of address 17452. The value at 17464 may have been set to 1 in the initial setup code, when the test is performed to try and access the high-speed reader/punch. If this test fails, the value at memory address 17464 will be 1, otherwise it will be zero. The value at 17452 is incremented if the user selected the "L" option on the short form-menu. It is also set in response to the "HIGH-SPEED READER/PUNCH?" question in the long-form menu, in which case it will be set to -1 if the user entered "Y", 1 if the user entered "N" and zero otherwise. Remember that the long-form high-speed reader/punch question is not asked if the initial setup test failed.


This gives the following possible values after this addition:

  1. Error in initial setup and user picks "L" in short-form menu - value after addition = 2.

  2. Error in initial setup and user does not pick "L" in short-form menu - value after addition = 1.

  3. No error on initial setup and user picks "L" in short-form menu - value after addition = 1.

  4. No error on initial setup and user does not pick "L" in short-form menu - value after addition = 0.

  5. No error on initial setup and user answers "Y" to select high-speed reader/punch in long-form menu - value after addition = -1

  6. No error on initial setup and user answers "N" to selection of high-speed reader/punch in long form menu - value after addition = 1.

  7. No error on initial setup and user answers anything other than "Y" or "N" to high-speed reader/punch question in long form menu - value after addition = 0.

If you look through the options above, you will see that all results of this addition that are greater than zero represent conditions that mean the low-speed reader/punch should be used, and all results that are less than or equal to zero represent conditions where the high-speed reader/punch should be used.


Therefore, the value in 17452 is tested and if it is less than or equal to zero, control jumps to address 16466, otherwise control will continue to the next instructions.

016450 012767 MOV #177560, 13704
016456 012767 MOV #177564, 13706
016464 000406 BR 16502
016466 012767 MOV #177550, 13704
016474 012767 MOV #177554, 13706

In the case where the low-speed reader/punch is being used the address of the TTY receive Control and Status Register (CSR) is loaded to address 13704, and the address of the TTY transmit CSR is loaded to address 13706. Then control branches over the next two instructions.


Otherwise, at address 16466, the CSR of the high-speed reader/punch receive CSR is loaded to address 13704 and the address of the high-speed reader/punch transmit CSR is loaded to address 13706.


After these instructions, therefore, adresses 13704 and 13706 contain the receive and transmit CSRs of the selected I/O device, respectively.


Memory limit configuration

The next thing that is configured is the memory limit. This is the code:

016502 016701 MOV 17462, R1
016506 016700 MOV 17450, R0
016512 001414 BEQ 16544
016514 000300 SWAB R0
016516 006300 ASL R0
016520 006300 ASL R0
016522 006300 ASL R0
016524 042700 BIC #3777, R0
016530 020027 CMP R0, #20000
016534 103666 BCS 16312
016536 020001 CMP R0, R1
016540 101001 BHI 16544
016542 010001 MOV R0, R1
016544 010106 MOV R1, SP
016546 012767 MOV #6, 4
016554 010167 MOV R1, 13712
016560 012701 MOV #16104, R1
   ; Move entry point address into R1

Let's take a look at that.

016502 016701 MOV 17462, R1

Firstly, the maximum memory value that was calculated in the initial setup code is moved to R1. This is the maximum valid memory address in the system.

016506 016700 MOV 17450, R0
016512 001414 BEQ 16544

Next, the numeric value that was specified in the short-option menu or the response provided to the "MEMORY?" question in the long-form menu is moved from memory address 17450 into R0. This will be a numeric value between 4 and 28.


If the value in address 17450 is zero, then we skip the memory configuration by branching to address 16544 if the last instruction set the zero flag (which will happen if the value being moved into R0 was zero).

016514 000300 SWAB R0
016516 006300 ASL R0
016520 006300 ASL R0
016522 006300 ASL R0

Now we take the value that the user specified and turn it into a number of bytes of memory to be allocated to BASIC. Firstly, the bytes of the value in R0 are swapped. Since we're talking about values between 4 and 28, this will take the value and move it into the high byte. In other words, multiply by 256. Then there are three shift left instructions, each of which is the same as multiplying by two.


Therefore, in total these four instructions are the same as multiplying by 2048. That got be wondering. If you look at the PDP-11 BASIC programming manual (chapter 7) where the options are described, it says the user can select "any integer between 4 and 28 to override automatic assignment of memory. The automatic process assigns to BASIC all memory available in the processor. This command allows the user to force BASIC to use less than the maximum configuration and is stated in increments of 1K." I had presumed that "1K" meant 1 kilobyte but actually the calculation above, when performed on the value 1 will give the result 2048, which is 2 kilobytes, or 1 kilo-WORD. Therefore, I am wondering now whether it would have been presumed by a reader of the BASIC manual that memory calculations would be carried out in words rather than bytes. Interesting...


In any case, this code multiplies the value specified by the user by 2048, giving the number of bytes of memory to be allocated to BASIC, and that value is stored in R0.

016524 042700 BIC #3777, R0

Next, the lowest 11 bits of R0 are all cleared.

016530 020027 CMP R0, #20000
016534 103666 BCS 16312

This code confirms that the configured memory limit is not lower than the lower bound.


The value in R0 is compared to the octal value 20000. In decimal, this is 8192 bytes (or 4096 words), which is the minimum required by BASIC. Therefore, if the calculated user input value is less than 20000, then the code branches to address 16312, which causes the long-form option menu to be displayed again.

016536 020001 CMP R0, R1
016540 101001 BHI 16544
016542 010001 MOV R0, R1

This code confirms that the configured memory limit is not higher than the higher bound.

The value in R0 is compared to the value in R1. R1 contains the value that was calculated as the maximum valid memory address in the system. Therefore, a value greater than R1 does not make sense. If R0 is greater than R1, then R0 is moved into R1. In other words, if the calculated value is greater than the maximum available amount of memory, the maximum available amount of memory is used.

016544 010106 MOV R1, SP

The value in R1 is now moved into the stack pointer. This means that the stack is positioned at the top of the available memory and will grow downwards from there.

016546 012767 MOV #6, 4

Next the value 6 is moved into memory address 4. This is the interrupt vector used to handle a "timeout and other error". Address 6 contains a zero word, meaning that if a "timeout and other error" occurs, the system will HALT (because of the HALT instruction at address 6).

016554 010167 MOV R1, 13712

The high memory address, which is also the location of the stack, is also stored to address 13712. This memory address is used to store the a stack base pointer.

016560 012701 MOV #16104, R1

Finally for this section of code, the value 16104 is moved into R1. I'm not sure why this is done, but this memory location is the entry point of the program.


Deleting functions

The next piece of code handles the deletion of the extended instructions (SIN, COS, ATN and SQR) and the LOG and EXP functions, if the deletion of those functions has been selected. I'll admit in advance that I'm not totally clear how this code works, so if I get a better understanding later I'll come back and update this post. For now, I'll share what I have inferred about how it works. Here's the code:

016564 005767 TST 17454
016570 003433 BLE 16660
016572 005067 CLR 5734
016576 005067 CLR 5736
016602 005067 CLR 5740
016606 005067 CLR 5750
016612 012701 MOV #15020, R1
016616 005767 TST 17456
016622 003416 BLE 16660
016624 005067 CLR 5742
016630 005067 CLR 5744
016634 012700 MOV #136, R0
016640 006200 ASR R0
016642 012701 MOV #13714, R1
016646 012702 MOV #17046, R2
016652 012221 MOV (R2)+, (R1)+
016654 005300 DEC R0
016656 003375 BGT 16652

Let's work through this.

016564 005767 TST 17454
016570 003433 BLE 16660

Firstly, the value 17454 is tested. A positive value here represents selection of the "D" option, to delete the extended functions. If the value is less than or equal to zero, control jumps to address 16660, skipping the deletion code.

016572 005067 CLR 5734
016576 005067 CLR 5736
016602 005067 CLR 5740
016606 005067 CLR 5750

These four specific memory addresses are cleared. As I look at what these addresses contain, they don't look like anything particularly special yet, so I suspect preliminarily that these are the entry points for the SIN, COS, ATN and SQR functions. Alternatively, these addresses used could be pointers to the actual function code, but I don't think so. My reasoning is that whenever I see function pointers, they don't tend to make sense as assembly language instructions, but the values at these four words seem to be perfectly reasonable assembly language instructions.


My current hypothesis is that by setting the entry points of the four functions to zero, attempts to call the functions will HALT. The fact that there are four addresses cleared, and four functions are being deleted (SIN, COS, ATN and SQR) also supports this hypothesis.

016612 012701 MOV #15020, R1

Next, the value 15020 is moved into R1. This is used below as code is being moved around.

016616 005767 TST 17456
016622 003416 BLE 16660

Next the value at 17456 is tested. This will be greater than zero if the "E" option was selected, to delete EXP and LOG, in addition to SIN, COS, ATN and SQR. If the value is less than or equal to zero, control jumps to address 16660, skipping the deletion code.

016624 005067 CLR 5742
016630 005067 CLR 5744

Next these two addresses are cleared. My reasoning is the same for the four addresses cleared earlier. I suspect these to be either pointers to the functions or the entry points to the EXP and LOG functions.

016634 012700 MOV #136, R0
016640 006200 ASR R0

Here, the value 136 is moved into R0. and then shifted right by one bit. This results in the value 57 (octal) being left in R0. This is the number of words that are to be copied between the two addresses below.

016642 012701 MOV #13714, R1
016646 012702 MOV #17046, R2

Now two values are moved into registers R1 and R2, 13714 and 17046 respectively. We are about to copy the number of bytes specified in R0 from location specified in R2 to location specified in R1.

016652 012221 MOV (R2)+, (R1)+
016654 005300 DEC R0
016656 003375 BGT 16652

Here a byte is moved from R2 to R1, and then both addresses are incremented. R0 is decremented and if it is greater than zero the code breanches to 16652 to move another byte. I an presuming that the purpose of this code is to overwrite some or all of the EXP and LOG functions with code used for some other purpose, to save space.


Halting to load the EXF function

The next section of the code is to load the EXF function. The EXF function is used to allow BASIC to invoke external (non-BASIC) programs. Here's the code:

016660 005767 TST 17460
016664 002036 BGE 16762
016666 012767 MOV #16702, 54
016674 010167 MOV R1, 13702
016700 000000 HALT
016702 016701 MOV 13702, R1
016706 012700 MOV #30, R0
016712 006200 ASR R0
016714 012702 MOV #17204, R2
016720 010167 MOV R1, 6006
016724 010167 MOV R1, 5444
016730 012703 MOV #16, R3
016734 060103 ADD R1, R3
016736 010367 MOV R3, 54
016742 012221 MOV (R2)+, (R1)+
016744 005300 DEC R0
016746 003375 BGT 16742
016750 016706 MOV 50, SP
   ; Move 50 into SP
016754 010667 MOV SP, 13712
   ; Move SP into 13712
016760 000402 BR 16766
   ; Branch to 16766

Let's see how it works.

016660 005767 TST 17460
016664 002036 BGE 16762

Firstly, the value 17460 is tested. This is the value set by the "H" option or the equivalent long-form option. If the value is less than zero that means the EXF function should be loaded. If the value is greater than or equal to zero, control jumps to 16762, skipping the EXF function loading code.

016666 012767 MOV #16702, 54

The value 16702 is loaded into address 54. When the EXF code finishes loading, the value at address 54 is loaded into the PC, therefore the code at memory address 16702 will be executed when the EXF code has finished loading.

016674 010167 MOV R1, 13702

R1 is now loaded into address 13704. This will have a value depending on which functions have been deleted, as follows:

  1. If no functions have been deleted, R1 has the value 16104

  2. If the extended functions are deleted, R1 has the value 15020

  3. If the extended and LOG, EXP functions are deleted, R1 has the value 13253 (13174+57)

I think this might be the first available address into which program code can be stored. Note that 16104 is the entry point (i.e. the initialisation code). This is not required once the initialisation has been completed. Perhaps the addresses between 15020 and 16104 contain SIN, COS, ATN and SQR, which have been deleted. Similarly, 13253-15020 may have been where LOG and EXP were stored.

016700 000000 HALT

The code then HALTs to allow the loading of the code from tape.


The use of the EXF function is described in the PDP-11 BASIC programming manual (chapter 8). There are a number of requirements for the code that are notable for the purposes of this analysis:

  1. The external routine must be loaded into the highest memory addresses available.

  2. The first word of the loaded routine must be the starting address (entry point) of the routine.

  3. The address of the first word of the routine must be loaded into location 50.

  4. The jump address (absolute loader jump address) must be 52.

016702 016701 MOV 13702, R1

After the code is loaded, the address stored in 13702 is moved back into R1.

016706 012700 MOV #30, R0
016712 006200 ASR R0

The value 30 is moved into R0 and then shifted right (divided by two).

016714 012702 MOV #17204, R2

The value 17204 is stored in R2. This is the location of the code used to execute the EXF function.

016720 010167 MOV R1, 6006
016724 010167 MOV R1, 5444

The value in R1 (possibly available memory address for code, as described above) is stored in addresses 6006 and 5444.

016730 012703 MOV #16, R3
016734 060103 ADD R1, R3
016736 010367 MOV R3, 54

16 is added to the value in R1, and this value is stored in memory address 54.

016742 012221 MOV (R2)+, (R1)+
016744 005300 DEC R0
016746 003375 BGT 16742

This code moves the number of R0 words from memory addresses at R2 to memory addresses at R1. This relocates the EXF handling code to the addresses starting at R1.

016750 016706 MOV 50, SP
016754 010667 MOV SP, 13712

If external code has been loaded, it is loaded at the highest possible memory addresses. The location of the start of the external code must be stored at location 50. The stack pointer, therefore, is set to the highest available address, which is the address of the beginning of external code. The new value of the stack pointer is also stored in address 13712, which is a form of stack base pointer.


Final setup

The final setup steps are carried out by the following code:

016762 005067 CLR 5760
016766 010167 MOV R1, 13662
016772 016705 MOV 13662, R5
016776 112725 MOVB #12, (R5)+
017002 005067 CLR 13702
017006 000167 JMP 3112

Let's see how this works.

016762 005067 CLR 5760

Firstly, the address 5760 is cleared. I don't know why yet.

016766 010167 MOV R1, 13662
016772 016705 MOV 13662, R5

The lowest available memory address for program code is stored in address 13662. The highest currently used memory address for progam code is also stored in R5. The size of the current code, therefore, is calculated by getting the difference between the value in address 13662 and R5.

016776 112725 MOVB #12, (R5)+

A linefeed character is stored at address R5 and R5 is incremented.

017002 005067 CLR 13702

The memory address 13702 is cleared. This value is set in the TTY interrupt handler (described in Part 1) and is used to indicate that the code has been interrupted. Clearing the value here indicates that the code has not been interrupted yet.

017006 000167 JMP 3112

Finally, after all of the setup has been done, the code jumps to 3112 to display "READY" and enter the main syntax parsing loop.


Displaying "READY"

This is the code to display "READY":

003112 005067 CLR 13664
003116 012700 MOV #4076, R0
003122 104466 TRAP 66

Firstly, the memory address 13664 is cleared. I'm not sure what this is for yet. Then the address 4076 is loaded into R0. This is the location of the string "READY". TRAP 66 is then used to display the string.


Summary

This concludes the analysis of the setup code. The next series of posts will begin the description of the syntax analysis code.


71 views0 comments

Recent Posts

See All

Comentários


bottom of page