First Version 23.8.1998
Last Updated 8.1.2000
Pasi Ojala, albert@cs.tut.fi

Burst Fastloader for C64

Commodore disk drives 1570/71 and 1581 implemented a new fast serial protocol to be used with the C128 computer. This synchronous serial protocol speeds up data transfer between the computer and the drive ten-fold. The amazing thing is that this kind of serial protocol was supposed to be used in VIC-20 and the 1540 drive until it was discovered that a hardware bug in the 6522 VIA (versatile interface adapter) chip prevented the reliable use of the chip's synchronous serial interface.

A working synchronous serial port would've allowed whole bytes to be sent in both directions without processor intervention with the maximum speed of one bit per two clock cycles. Without a bug-free synchronous serial port the transfer had to be slowed down considerably so that the receiver has a chance to detect all changes in the serial bus lines. This became the dead slow software-driven Commodore serial protocol.

Synchronous and Asyncronous Transfer

Synchronous transfer uses a clock line to indicate when data is being transferred. A rising edge (or a falling edge, just a matter of definition) on the clock line indicates that new data is available. In a serial protocol a byte is sent one bit at a time. Each time the clock signal has a rising edge, the hardware samples the state of the data line and remembers the bit. After eight bits are received, the software can read the assembled byte from a hardware register. The software only needs to wait until a byte is ready and read it before the next byte has been received.
             0__   1__   2__   3__   4__   5__   6__   7__
    clock  __|  |__|  |__|  |__|  |__|  |__|  |__|  |__|  |__

             0   __1__   0   __1__   0   __1_____1__   0
    data   _____|     |_____|     |_____|           |________
Synchronous transfer of byte value $56 is performed here most-significant bit first. A bit is valid between clock falling edges, the data is sampled during rising edges.

If the bit collection is performed in software, the clock cycle must be sufficiently long for the software to detect the rising edges. If a specific speed is selected, a slower receiver can not be accommodated and a faster receiver can not transfer bits any faster than the predefined speed.

In an asynchronous transfer protocol the receiving end acknowledges all data it receives, so that the sender knows when to send more. This protocol works independently of receiver speed. Data is acknowledged as soon as the receiver is ready to receive more. A faster receiver means faster transfer, and a slower receiver still works, albeit more slowly. It also doesn't matter whether the protocol is implemented in hardware or in software. This is why an asynchronous protocol is a better choice for compatibility.

             0____1     2_.._____3     4_____5     6_____7
    clock  __|    |_____|        |_____|     |_____|     |________

              0   __1__      0   __1__   0   __1_____1__   0
    data   ______|     |__..____|     |_____|           |_________

                _____          _____       _____       _____
    ack    ____|     |____..__|     |_____|     |_____|     |_____
Asynchronous transfer of byte value $56 is performed here most-significant bit first. A bit is valid after a change in clock. A bit is acknowledged by changing the state of the ack signal.

Note that the receiver can for example handle interrupts or tolerate bad lines while receiving data. The sender will hold off sending more data until it has received an acknowledgement and no data is lost, however long the interruption becomes.

Commodore didn't go this road, however. They did not include an acknowledge signal. The actual synchronous serial protocol was already decided upon, and they just decided to make the clock cycle long enough to make it possible to detect it in software. And as if things were not already slow enough, the protocol needed to be slowed down even more when C64 was introduced because the bad lines could've made C64 lose bits. This is why a C64 does not properly work with a 1540 (or a 1541 in VIC20 mode) unless the screen has been blanked and thus does not generate bad lines.

Synchronous Serial

The complex interface adapter (6526 CIA) chips used in Commodore 64 and later in Commodore 128 have bug-free synchronous serial interfaces: serial data and serial clock inputs/outputs. In input mode, each time a rising edge is detected in the serial clock pin (CNT), the state of the serial data (SP) is shifted into a register. When 8 bits are received the accumulated bits are moved into the serial data register and a bit is set in the interrupt status register to reflect this. If the corresponding interrupt is enabled, an interrupt is generated.

In output mode the serial clock line is controlled by Timer A. The serial clock is derived from the timer underflow pulses. When a byte is written to the serial data register, the value is clocked out through the serial data pin (SP) and the corresponding clock signal appears on the serial clock pin (CNT). After all 8 bits are sent, the serial interrupt bit is set in the interrupt status register.

Synchronous serial bus is used in C128/157x/1581 fast serial protocol. An obsolete signal in the peripheral serial bus (SRQ) was taken into service as the new fast (synchronous) serial clock line. The old serial data line doubles as both slow and fast serial data line. And the old serial clock line doubles as slow serial clock line and fast serial (byte) acknowledge line.

The fast serial protocol is basically very simple. The side sending data configures its synchronous serial port into output mode, the other side uses input mode. The old peripheral serial bus clock line is controlled by the receiving side and is used as an acknowledge: when the receiver is ready for data, it toggles the state of the clock line. The actual data is transferred using the synchronous serial ports. The sender writes the data to be sent into the serial data register and waits for the transfer to complete. The receiver waits for a byte to arrive into its serial data register. The actual bit transfer is automatically handled by the hardware.

Both the drive and the computer must detect whether the other side can handle fast serial transfers. The computer sends one fast byte when commanding the drive to listen or talk and the drive sends a fast serial byte back. The computer can in practice send the fast serial byte anytime after the drive is reset and before the drive would send fast serial bytes.

If the computer received a fast byte, it can use the fast serial to talk to the drive. The block transfer rate increases by a factor of ten, which is mostly evident in the C128 loading speed. Even normal byte transfers become faster by a factor of two. Only two, because the CHRIN and CHROUT transfers must still be initiated by the slow serial clock and data lines that are used to generate the EOI (last byte to send/receive) condition.

Modification to c64

To use burst fastloader with C64 we need to connect the CIA synchronous serial port to the synchronous serial lines of the Commodore peripheral serial bus. Two wires are needed: one to connect the serial bus data line to the syncronous serial port data line and one to connect the serial bus SRQ (the obsolete line for service request, now fast serial clock) to the synchronous serial port clock line. Select the right connections depending on whether you want to use CIA1 or CIA2.

	1570/1,1581				C64

Pin1	SRQ	Fast serial bus clk		CNT1/2	User port 4/6
Pin5	DATA	Data - slow&fast bus		SP1/2	User port 5/7


Top view - old c64, CIA1
User port	Cass port	Serial connector

||||||||||||	||||||		 HHHHH		behind:
||||||||||||	||||||	       .-1 3 5-.
       ||______________________|  2 4  |	  / \
       |	CNT1		   6   |	 // \\
       |_______________________________|	 |||||
		SP1				1 264 5


Top view - old c64, CIA2
User port	Cass port	Serial connector

||||||||||||	||||||		 HHHHH		behind:
||||||||||||	||||||	       .-1 3 5-.
     ||________________________|  2 4  |	  / \
     |	CNT2			   6   |	 // \\
     |_________________________________|	 |||||
		SP2				1 264 5

Solder the wires either to the resistor pack or directly to the user port connector, but remember to leave the outer half of the connector free so that you can still plug in your user port devices.

Then solder the other ends to the serial connector. Those left- and rightmost pins are 1 and 5, respectively, so it is fairly easy to do the soldering. You can also build a cable which connects those lines externally.

However, this modification has one important difference to the C128 fast serial. In C64 the serial SRQ line is also connected to the cassette read input. When this modification is present you can't use burst-capable drives when the cassette drive is connected. By cutting a trace between SRQ and cassette read you can probably make the cassette and fast serial work at the same time, but I don't like these kind of destructive modifications. Besides, the trace could be in a different place in different machines.

The C128 hardware includes a buffer driver between SRQ and the cassette read line so that cassette activity or cassette drive presence will not disturb the fast serial port. It also has a two-directional buffer that connects SRQ and DATA to the CIA1 synchronous serial port. The direction is controlled by the MMU chip. These buffers are required to hide the fast serial connection is C64 mode.

In C128 you should either use CIA2 or connect the wires through a "two-channel" on-off switch. Otherwise the CIA1 connection will interfere with the C128 native mode operation. The switch makes it possible to disable the modification in C128 mode and enable it again in C64 mode.

Theoretically you should be able to make the modification work with C128 in both modes by first connecting the wires and then cutting/bending up U58 (74LS03) pins 3 and 8, and U60 (7407) pins 6 and 8. This disables the C128 fast serial hardware and the added wires will perform their function in C128 and C64 modes. I have not tested this, so if you try it, I would be very interested in your results.

Software for C64

Of course the C64 only uses the standard slow serial routines and we need a separate fastloader routine to take advantage of the fast serial connection we just soldered into our machine. The fastload routine is located in the unused area $2a7-$2ff and in the cassette buffer $334-$3ff. Just load and run the "burster" program. It installs the loader and replaces the default load routine by our routine. The old load routine is used if So, it is possible to use the old load routine by prepending a colon (':') to the filename. This is needed if you need to use both fast and slow serial devices at the same time. Unfortunately detecting fast-serial-capable devices is not feasible with the loader, because a lot of ROM code would have to be duplicated and then the loader would become too large to fit in low memory. Because of this it becomes the responsibility of the user to prepend the colon (':') if a slow serial device is accessed.

A fastloader version is available for both CIA1 (asm, exe) and CIA2 (asm, exe) versions, uuencoded versions are attached to this article. Only the CIA1 version is discussed here.

Other software supporting the C64 burst modification (CIA1):

Now that was it. Now I just hold back and wait until someone implements this for VIC-20's buggy 6522 chips so that I don't have to.. :-)


Fastload Routine

A short description of the burst protocol and commands can be found from the "1581 Disk Drive User's Guide". FD2000 burst commands are as compatible as possible. Thanks go to Brian Ketterling for typing them from his manual.
; DASM V2.12.04 source
;
; Burst loader routine, minimal version to allow loading of programs upto 63k
; in length ($400-$ffff). Directory is loaded with the normal load routine.
;
; 1987-99 Pasi Ojala, Use where you want, but please give me some credit
;
; This program needs SRQ to be connected to CNT1 and DATA to SP1 (CIA1).
; Cassette drive won't work with those wires connected if the disk drive
; is turned on. (SRQ is connected to cassette read line.)
;
; SRQ = Bidirectional fast clock line for fast serial bus
; DATA= Slow/Fast serial data (software clocked in slow mode)

	processor 6502

	ORG $0801
	DC.B $b,8,$ef,0	; '239 SYS2061'
	DC.B $9e,$32,$30,$36,$31
	DC.B 0,0,0

install:
	; copy first block to $2a7..$2ff
	ldx #block1_end-block1-1	; Max $58
0$	lda block1,x
	sta _block1,x
	dex
	bpl 0$
	; copy second block to $334..$3ff
	ldx #block2_end-block2		; Max $cc
1$	lda block2-1,x
	sta _block2-1,x
	dex
	bne 1$

	lda $0330	; load vector
	ldx $0331
	cmp #MyLoad
	beq 3$
2$	sta OldVrfy+1	; chain the old load vector
	stx OldVrfy+2
	lda #MyLoad
	sta $0331
3$	rts

block1
#rorg $02a7
_block1
OldLoad	lda #0
OldVrfy	jmp $f4a5	; The 'normal' load.

MyLoad:	;sta $93
	cmp #0		; Is it a prg-load-operation ?
	bne OldVrfy	; If not, use the normal routine
	stx $ae		; Store the load address
	sty $af
	tay		; ldy #0
	lda ($bb),y	; Get the first char from filename
	ldy $af
	cmp #$24	; Do we want a directory ($) ?
	beq OldLoad	; Use the old routine if directory
	cmp #58		; ':'
	beq OldLoad

	; Activate Burst, the drive then knows we can handle it
	sei		; We are polling the serial reg. intr. bit
	ldy #1		; Set the clock rate to the fastest possible
	sty $dc04
	dey		; = ldy #0
	sty $dc05
	lda #$c1
	sta $dc0e	; Start TimerA, Serial Out, TOD 50Hz
	bit $dc0d	; Clear interrupt register
	lda #8		; Data to be sent, and interrupt mask
	sta $dc0c	; (actually we just wake up the other end,
0$	bit $dc0d	;  so that it believes that we can do
			;  burst transfers, data can be anything)
	beq 0$		; Then we poll the serial (data sent)
	; Clears the interrupt status

	; This program assumes you don't try to use it on a 1541
	; If you try anyway, your machine will probably lock up..

	lda #$25	; Set the normal (PAL) frequence to TimerA
	sta $dc04	; Change if you want to preserve NTSC-rate
	lda #$40
	sta $dc05
	lda #$81
	jmp LoadFile

GetByte	lda #8		; Interrupt mask for Serial Port
0$	bit $dc0d	; Wait for a byte
	beq 0$		;  (Serial port int. bit changes, hopefully)
	;ldy $dc0c	; Get the byte from Serial Port Register
ToggleClk:
	lda $dd00	; Toggle the old serial clock (=send Ack)
	eor #$10	;  so that the disk  drive will start
	sta $dd00	;  sending the next byte immediately
	;tya		; return the value in Accumulator, update flags
	lda $dc0c	; Get the byte from Serial Port Register
	rts
#rend
block1_end


block2
#rorg $0334
_block2

LoadFile:
	sta $dc0e	; Start TimerA, Serial IN, TOD 50Hz (PAL)
	;cli

	jsr $f5af	; searching for ..

	lda $b7		; Preserve the filename length
	pha
	lda $b9		; Do the same with secondary address
	sta $a5		; We store it to cassette sync countdown..
			;  No cassette routines are used anyway, as
	lda #0		;  this prg is in cassette buffer..
	sta $b7		; No filename for command channel
	lda #15
	sta $b9		; Secondary address 15 == command channel
	lda #239
	sta $b8		; Logical file number (15 might be in use?)
	jsr $ffc0	; OPEN
	sta ErrNo+1
	pla
	sta $b7		; Restore filename length
	bcs ErrNo	; "device not present",
			; "too many open files" or "file already open"
	; Send Burst command for Fastload
	ldx #239
	jsr $ffc9	; CHKOUT Set command channel as output
	sta ErrNo+1
	bcs NoDev	; "device not present" or other errors

	; Bummer, the interrupt status register bit indicating fast serial
	; will be cleared when we get here..

	ldy #3
3$	lda BCMD-1,y	; Burst Fastload command
	jsr $ffd2
	dey
	bne 3$
	; ldy #0
1$	lda ($bb),y
	jsr $ffd2	; Send the filename byte by byte
	iny
	cpy $b7		; Length of filename
	bne 1$
	jsr $ffcc	; Clear channels

	sei
	jsr $ee85	; Set serial clock on == clk line low
	bit $dc0d	; Clear intr. register
	jsr ToggleClk	; Toggle clk

	jsr HandleStat	; Get Initial status
	pha		; Store the Status

	;jsr $f5d2	; loading/verifying
	; (uses CHROUT, which does CLI, so we can't use it)

; We could add a check here..
; if we don't have at least two bytes, we cannot read load address..

; It seems that for files shorter than 252 bytes the 1581 does not count
; the loading address into the block size.

	jsr GetByte	; Get the load address (low) - We assume
			; that every file is at least 2 bytes long
	tax
	jsr GetByte	; Get the load address (high)
	tay		; already in Y
	lda $a5		; The secondary address - do we use load
			;  address in the file or the one given to
	bne Our		;  us by the caller ?
	stx $ae		; We use file's load addr. -> store it.
	sty $af
Our	ldx #252	; We have 252 bytes left in this block
	pla		; Restore the Status
	bne Last	; If not OK, it has to be bytes left
Loop	jsr GetAndStore	; Get X bytes and save them
	jsr HandleStat	; Handle status byte
	beq Loop	; If all was OK, loop..
Last	tax		; Otherwise it is bytes left. Do the last..
	jsr GetAndStore	; Get X number of bytes and save them
	jsr $ee85	; Serial clock on (the normal value)
	lda #239
	jsr $ffc3	; Close the command channel
	clc		; carry clear -> no error indicator
	bcc End

FileNotFound:
	pla		; Pop the return address
	pla
	jsr $ee85	; Serial clock on (the normal value)
	lda #4		; File not found
	sta ErrNo+1
NoDev	lda #239
	jsr $ffc3	; Close the command channel
ErrNo	lda #5		; Device not present
	sec		; carry set -> error indicator
End	ldx $ae		; Loader returns the end address,
	ldy $af		;  so get it into regs..
	cli
	rts		; Return from the loader

HandleStat:
	jsr GetByte	; Get a byte (and toggle clk to start the
			;  transfer for next byte)
	cmp #$1f	; EOI ?
	bne 0$
	jmp GetByte	; Get the number of bytes to follow and RTS
0$	cmp #2		; File Not Found ?
	bcs FileNotFound	; file not found or read error
	; code 0 or 1 -> OK
	ldx #254	; So, the whole block is coming
	lda #0		; No error -> Z set
	rts

GetAndStore:
	jsr GetByte	; Get a byte & toggle clk
	;sta $d020
	ldy #$34
	sty 1		; ROMs/IO off (hopefully no NMI:s occur..)
	ldy #0
	sta ($ae),y	; Store the byte
	ldy #$37
	sty 1		; Restore ROMs/IO (Should preserve the
			;  state, but here it doesn't..)
	inc $ae		; Increase the address
	bne 0$
	inc $af
0$	dex		; X= number of bytes to receive
	bne GetAndStore
	rts

BCMD:	dc.b $1f, $30, $55	; 'U0',$1F == Burst Fastload command
				; If $9F, Doesn't have to be a prg-file
#rend
block2_end

begin 644 burster-cia1
M`0@+".\`GC(P-C$```"B5[U"")VG`LH0]Z+'O9D(G3,#RM#WK3`#KC$#R:S0[
M!.`"\!"-J@*.JP*IK(TP`ZD"C3$#8*D`3*7TR0#0^8:NA*^HL;NDK\DD\.K)Y
M.O#F>*`!C`3<B(P%W*G!C0[<+`W<J0B-#-PL#=SP^ZDEC03<J4"-!=RI@4PTB
M`ZD(+`W<\/NM`-U)$(T`W:T,W&"-#MP@K_6EMTBEN86EJ0"%MZD/A;FI[X6X-
M(,#_C<0#:(6WL&NB[R#)_XW$`[!<H`.Y]P,@TO^(T/>QNR#2_\C$M]#V(,S_R
M>""%[BP-W"#S`B#,`T@@[`*J(.P"J*6ET`2&KH2OHOQHT`@@WP,@S`/P^*H@@
MWP,@A>ZI[R##_QB0$FAH((7NJ02-Q`.I[R##_ZD%.*:NI*]88"#L`LD?T`-,A
G[`+)`K#:HOZI`&`@[`*@-(0!H`"1KJ`WA`'FKM`"YJ_*T.A@'S!5/
``
end
size 354
begin 644 burster-cia2
M`0@+".\`GC(P-C$```"B2[U"")VG`LH0]Z+)O8T(G3,#RM#WK3`#KC$#R:S0E
M!.`"\!"-J@*.JP*IK(TP`ZD"C3$#8*D`3*7TR0#0^8:NA*^HL;NDK\DD\.K)Y
M.O#F>*`!C`3=B(P%W:G!C0[=+`W=J0B-#-TL#=WP^TPT`ZD(+`W=\/NM`-U)T
M$(T`W:T,W6"I@(T.W2"O]:6W2*6YA:6I`(6WJ0^%N:GOA;@@P/^-Q@-HA;>PZ
M:Z+O(,G_C<8#L%R@`[GY`R#2_XC0][&[(-+_R,2WT/8@S/]X((7N+`W=(.<"9
M(,X#2"#@`JH@X`*HI:70!(:NA*^B_&C0""#A`R#.`_#XJB#A`R"%[JGO(,/_*
M&)`2:&@@A>ZI!(W&`ZGO(,/_J04XIJZDKUA@(.`"R1_0`TS@`LD"L-JB_JD`H
=8"#@`J`TA`&@`)&NH#>$`>:NT`+FK\K0Z&`?,%4"Y
``
end
size 344

To the homepage of albert@cs.tut.fi