SAMdiskHelper

If you’ve accessed BDOS-format disks in Windows, you’re probably aware of the need to run with Administrator rights. For security reasons, raw disk devices cannot be opened by normal unprivileged users.

Starting with Windows Vista, processes are launched with basic rights, even if the current user is a member of the Administrators group. To run with elevated rights the user must either manually launch a program by right-clicking and selecting “Run as adminstrator”, or the program’s manifest file must request it. Both result in a somewhat jarring User Access Control confirmation prompt before the program is launched.

This is a problem for both SAMdisk and SimCoupe, which support BDOS-format disks. Always requesting elevation is not a good option, as raw disk access is currently the only feature that requires elevation. This is where SAMdiskHelper comes to the rescue. It runs as a service under the SYSTEM user with full access to all disks, and can selectively provide access to them. The one-time installation still requires elevated rights, but after that the accessing program can be run with normal rights.

For safety, only disks with a recognised BDOS or Pro-DOS signature are exposed as read-write through SAMdiskHelper. Other disks are seen as read-only by code-signed versions of SAMdisk, and completely inaccessible to all other programs. These rules do mean that new media cards will not be recognised before they’re formatted, ideally on the real Atom device you intend to share the disk with.

To use SAMdiskHelper you just need to have it installed. Supported versions of SAMdisk (v3.8.3+) and SimCoupe (from May 2014) will use it automatically if needed.

You can find the download on my normal website.

TrinLoad v1.0

Developing Trinity-specific code has typically meant assembling directly on real SAM hardware, or assembling on the PC and transferring the program over to SAM. In my case the latter involved writing the disk image out from pyz80 to an SD card using SAMdisk, moving the card over to Trinity, rebooting SAM to have the new card recognised, re-selecting the development record, then loading booting or loading the program. Despite the benefits of a familiar PC code editor and faster assembler, the transfer process was still a chore.

I had a similar experience using homebrew code on the Sega Dreamcast. The earliest method was to burn content to a CD, but that was terribly slow and wasteful (re-writable CDs didn’t work). Next best was to push code to it over a serial cable, which was better but still became a chore as programs grew in size. The best option was to use the BroadBand Adapter and push code to it over the network. This required booting a helper utility (“dcload-ip”) from CD, which listened for and executed any code sent to it.

Given one of Trinity’s features is an ethernet adapter, it made sense to do something similar for SAM — and so TrinLoad was created!

My initial requirements were:
– be discoverable from a desktop PC
– accept code or data over the network, written to a given page and offset
– execute from a given page and offset
– simple implementation on SAM to minimise RAM footprint (and work!)

Using a UDP broadcast for discovery seemed like a no-brainer. SAM would always be on the same sub-net, and UDP is a simple connectionless protocol to implement. I chose to use UDP port EDB0, with a single byte payload of “?” as my discovery request. Any listening SAM machines would respond with “!” to indicate their availability. The UDP response would automatically include their IP address for any further communications. As an added bonus I included handlers for ARP who-has and ICMP echo, allowing SAM to respond to pings.

TCP would be the natural choice for reliable data communications, but without a network stack we’d have to implement it ourselves. For that reason I decided to stick with UDP for the data transmission, using the same port as before. Each packet would be ACKed on receipt, to confirm successful delivery, and to act as a transmission throttle so Trinity’s receive buffer didn’t overflow. The data format begins with a 4-byte header: “@” for a type indicator, followed by the target page number, then a 16-bit page offset in little-endian format. This is followed immediately by the data to write to that location. No length is needed as it can be calculated from the UDP data length, minus the 4-byte header. Data transfers average 29K/s, which includes the network receive, copying into place, and ACK.

The longest data block we can transfer in a single packet is 1468 bytes. This is calculated from the ethernet packet data size (1500 bytes), minus headers for IPv4 (20 bytes), UDP (8 bytes), and our data header (4 bytes). Longer blocks must be split into multiple packets, with the client program advancing the offset and page in each one.

After transferring the code we can execute it using a package beginning with “X”. This is followed by a page number to write to HMPR, and a 16-bit address to start execution. LMPR points to the normal BASIC location, with ROM0 enabled, so you’re free to load small routines at address 0x4000 if you want to. The only area to avoid is 0x6000-0x7fff, which is used for TrinLoad code and ethernet buffer. If the calling environment is preserved, returning from your test code will drop back into TrinLoad, ready to receive the next build.

To further streamline the process you can also start TrinLoad automatically on boot-up. This requires special versions of the Trinity flash code and Trinity BDOS, to skip the SD card reporting delay and auto-boot record 1. Then simply add a small auto-boot BASIC program to record 1, to switch to the TrinLoad record and load it. The final piece of the puzzle is an enhanced SAMdisk, with a special sam: target to find a SAM on the local network and send a binary to it from a disk image. Adding this to an existing pyz80 build process makes testing code on real hardware easier than ever.

Potential future uses of the network link:
– read and write records on the SD card from the PC
– dump floppy disks (even custom formats) to a disk image on the PC
– link to modified TurboMON for single stepping and software breakpoints
– custom stream for read/write link to the PC from BASIC

A pre-built disk image can be downloaded here, for use with SAMdisk v3.8.5 or later. Boot this on your real SAM with Trinity attached, then send your pyz80 disk image output to it using: SAMdisk image.dsk sam:

The source code for TrinLoad is available on GitHub.

SimCoupe for Raspberry Pi (SDL 2.0)

Previous versions of SimCoupe used SDL 1.2 on the Pi. SDL 1.2 video surfaces are fully implemented in software, typically giving a fixed-size output window without any fancy features such as alpha transparency (well, not at a reasonable speed).

SimCoupe also supported OpenGL though a thin SDL wrapper to give hardware acceleration on many platforms (including Linux and Mac). Unfortunately, the Pi only supports OpenGL ES 2.0 in hardware, so the plain OpenGL implementation fell back on a slow Mesa software implementation. This was slower than the plain SDL 1.2 video surfaces due to SimCoupe’s use of alpha blending for OpenGL scanlines.

I recently added SDL 2.0 support to SimCoupe, to give hardware acceleration support on most platforms, including the Pi. I was hoping to provide updated build instructions for you to make your own, but Rasbian doesn’t yet come with a binary libsdl2 package. You can build that yourself but it has a few extra package dependencies and the build process takes around an hour.

To save time I’m just releasing a pre-built binary package for now. Matching source is available at SourceForge, and if you really want to build it yourself I can help with build instructions. Things should be a lot simpler once Raspbian includes SDL 2.0.

Here’s how to get it:

wget http://simcoupe.org/files/simcoupi-20140202.zip
unzip simcoupi-20140202.zip

./simcoupe

This version has experimental support for vsync, so connecting your Pi to a modern PAL TV with picture processing should give nice smooth scrolling like the original SAM. Most PC monitors are generally fixed at 60Hz, even if you force a 50Hz mode using the hdmi_mode mode setting in config.txt on the Pi, so you probably won’t see any smoothness benefit.

You can run it from the console or under X, but OpenGL ES 2.0 support on the Pi only works in fullscreen mode. If you want to run in a window you’ll need to build with SDL 1.2 instead, which will be used if SDL 2.0 isn’t found. F5 toggles 5:4 mode, F6 toggles smoothing (bilinear filtering), and F7 toggles hi-res scanlines. Those key bindings may change in future versions, but they provide easy access to some of the newer video features.

This is still very much a development version, so there are some known issues:

  • The video options haven’t yet been updated for the new features.
  • Manual speed control supports only 50% and 100%.
  • Minor sound glitches on some setups due to vsync.
  • Higher than expected sound latency (needs investigation).

I’ve only tried it on the current Rasbian release so far, so it may or may not work on other Pi distributions. Please also make sure your system is up-to-date as newer firmware releases can make all the difference. You can do that using:

sudo apt-get update ; sudo apt-get upgrade

If keyboard input stops working or you’re experiencing random hangs, please ensure you’re using a compatible Pi power source. Cheap PSUs and USB ports may appear supply enough for basic use, but SimCoupe pushes the Pi harder than most apps and that can expose any weaknesses. A typical sign of this is that you’ll lose network access, which leaves only the red LED lit on the Pi board. Of course, if you don’t have your Pi connected to a network it’s normal to only have just the red LED 😉

I’d welcome any feedback on how well it works (or doesn’t) for you.

Spectrum Snapshot Tracing

Given a Spectrum snapshot, is it possible to determine which areas are code?

A typical approach would be to modify an emulator to record the location of every instruction executed, and let the snapshot run normally for a while. This gives a guarantee about marked locations, but it’s limited to code that is actually run during the analysis. For a complete picture you need to follow every code path, which would be a challenge even if you made the effort to use every feature and play through every possible outcome of a game. For bulk-processing snapshots it’s even more difficult.

I’ve long wondered whether spidering code was a realistic option. Given a starting point, you recursively follow every possible path, stopping only when you reach a dead-end (RET) or a previously visited location. It felt pretty straight-forward so I knocked up a quick test program to try it.

The initial approach was simple: follow JP instructions, recursively process CALLs and conditional jumps, stop at RETs, and blindly skip over anything else. Skipping instructions requires some opcode filtering, to identify multi-byte instructions with operands and those with CB/ED/DD/FD prefixes. Indexed instructions also need a little extra attention to skip a possible index offset, which aren’t present in the HL version of the same instruction. Despite all that, a while loop, small switch statement, and index flag were enough to cover it.

To track previously visited locations I used a 64K array, shadowing each addressable memory location. The tracing loop marks the array at the current PC offset as it worked, stopping processing if the entry was already marked. For completeness I also mark entries for the operands, so a normal run of code is a contiguous block. Before the program exits it converts the array to a 256×192 image to help visualise the code found, with 1 (green) pixel per byte instruction byte.

The first run was spectacularly short, stopping after only a few instructions when the first RET was encountered. The problem was that the snapshot was taken at a relatively arbitrary position in the code, not at the top-level entry point. In this case we need to follow the return, taking the return address from the stack. If the parent routine also returned, we’d need to do the same thing again for the next value on the stack. That meant tracking changes to the stack pointer to ensure it was at the correct position for further returns.

Related to this, what if there was data on the stack at the time of the snapshot? The RET would pick that up instead, and it’s likely we’d start tracing a non-code location. To fix that we must process PUSH and POP instructions too, and adjust the SP value appropriately. INC/DEC SP also needed similar treatment. At this point I was starting to worry the code was turning into an emulator!

Of course, returning with data on the stack is still a valid thing to do in some cases. Unfortunately it’s a dynamic value that isn’t known to our static tracing, so the only option is to stop the trace path. Similarly, JP (HL) and friends are also treated as unknown dynamic targets. These are most likely to be used for jump-table lookups, and are probably the biggest limitation with static tracing.

This leads on to the problem of mixing code and data. RST 08 in the Spectrum ROM is followed by a single byte for an error code, which is picked up inside the routine by popping the return address, or using EX (SP),HL. Some games also use the same technique to inline data after a CALL. Under normal circumstances we trace both the called routine and the path beyond the CALL, but this would lead us to trace the trailing data, with unpredictable results. Fortunately, the stack tracking comes to the rescue here, allowing us to detect this as a stack underflow condition. We can’t determine how much data is present, but we can stop the parent trace continuing.

There’s a further complication to this issue. If a different part of the code calls the same routine, it would be skipped as a previously visited location before the stack underflow caught the data access, and would continue into the data. To fix this I added another marker array to allow called locations to be blacklisted, so future calls to the same code is stopped at the call.

Also connected to this is the technique used to discard return addresses from within a routine, usually by simply popping them off the stack. This also looks like an attempt to access data on the stack, and triggers call blacklisting. As a work-around I attempt to recognise code signatures for data access, including POP ss ; LD r,(ss). This is likely to need further improvement as other examples are found.

To help detect tracing escaping genuine code I’ve added some diagnostics tests: If tracing encounters a block of 4 or more NOPs in RAM, it suspects the trace has run into open memory. If an inert LD r,r instruction is encountered it suspects unwanted data tracing. In both cases it displays a warning message and stops the current trace thread, so they can be investigated.

Sometimes even the PC value in the snapshot isn’t enough as a starting point. One of the earlier snapshots I tried was Manic Miner, which sits at a PAUSE 0 after loading, requiring a key press to start the game. This is a problem because there’s no code-only path into the game, due to the start location being encoded as data in the BASIC statements. If the PC trace gives no result in RAM, I fall back looking up the PROG system variable, and if it’s pointing in roughly the correct location I scan the basic listing for USR statements. Whenever a USR n or a USR VAL n$ is found the address is traced as a new entry point. Code traced from a USR statement is shown in red, with overlapping locations using additive colour mixing to give yellow.

Another entry point option is the interrupt handler. The IM 1 handler is of little interest as it’s completely self-contained, but if the snapshot indicates IM 2 we can determine the handler address and call it as another possible entry point. Code traced from the IM 2 handler is shown in blue, with colour mixing as before.

Static tracing is only suitable for 48K snapshots, as paging may require dynamically calculated values to identify both ports and pages. In many cases 128K games only use the extra memory for music or level data, so I don’t see it as a huge problem. A bigger issue is that it requires snapshots, and most archived software is preserved in tape format. However, generating snapshots from tape images is a a trivial task for a modified emulator.

Both static and dynamic tracing have their uses, and a combination of both will be the killer solution. Perhaps dynamic tracing to feel the bones of a program, and static tracing to flesh out the bits that can’t be reached? In some cases dynamic tracing may even be able to determine how to reach areas that weren’t visit on a first run, particularly if decisions are based on keyboard input. It could also help solve the ambiguous cases where static tracing isn’t sure what the code is doing.

Here’s the output from tracing a snapshot of a freshly loaded Dynamite Dan II:

dd2.szx: PC=03D6 SP=5E05 I=3F IM=1
8155: return address popped
714D: return address popped
Traced 7195 code bytes in RAM.

And the trace image it produced:

If you’d like to take a closer look, and maybe even improve on what I’ve done so far, the source code is now available on GitHub.

SimCoupe for Rasbian

Rasbian is the new OS recommendation for the Raspberry Pi. It’s slightly better configured than the previous Debian “squeeze” image, with fewer steps needed to build SimCoupe.

Here’s an update to my previous instructions, plus a new binary:

System Requirements

• Raspberry Pi board
• Rasbian “wheezy” (2012-07-15-wheezy-raspbian) written to SD card
• Ethernet connection for software downloads

Building From Source

Install the SDL development library and source control tool (about 40MB):

sudo apt-get install libsdl1.2-dev subversion

Fetch the SimCoupe source code:

svn co http://simcoupe.svn.sf.net/svnroot/simcoupe/trunk/SimCoupe@1439

It’ll take around 20 seconds before the files begin downloading.

Then compile the code:

cd SimCoupe/SDL && make

After about 10 minutes you’ll be ready to launch SimCoupe:

./simcoupe

Simples!

Binary Download

Here’s one I made earlier:

wget http://simcoupe.org/files/simcoupi-r1439.zip
unzip simcoupi-r1439.zip

./simcoupe

SimCoupe for Raspberry Pi

Raspberry Pi boards are starting to reach more end users, so it seems like a good time to cover what’s needed to get SimCoupe running on it.

The instructions below will lead you through downloading and building SimCoupe on the Pi itself. If you’d prefer to download a ready-to-run binary, skip to the end.

System Requirements

• Raspberry Pi board (or QEMU ARM setup)
• Debian “squeeze” (debian6-19-04-2012) image written to SD card
• Ethernet connection for software downloads

Building From Source

As a first step we add the pi user to the video group, so it has permission to use the framebuffer device (/dev/fb0). This is needed to run SimCoupe from the console:

sudo usermod -a -G video pi

For this to take effect you’ll need to log out and back in again:

exit

The Debian image includes most of the development tools, but we need some additional libraries, and SubVersion:

sudo apt-get install libsdl-dev libz-dev subversion

Press Enter when prompted to confirm the downloads (around 22MB). It’ll take a couple of minutes to install them once the downloads complete.

Next we fetch the SimCoupe source code:

svn co https://simcoupe.svn.sourceforge.net/svnroot/simcoupe/trunk/SimCoupe@1413

This will appear to do nothing for around 20 seconds as it determines the list of files to download, so please be patient. The @1413 suffix selects a specific code revision known to work on the Pi. You can remove this suffix to download latest revision, but there may be additional building requirements.

We’re now ready to compile the code using:

cd SimCoupe/SDL
make

This will take about 8 minutes, so go make yourself a cup of tea.

Once that completes you should have a binary ready to be launched using:

./simcoupe

Before you do that, let’s download a SAM game demo to play:

wget http://tinyurl.com/manicmdemo

As a final step we’ll load the ALSA sound driver for Pi audio support, which isn’t enabled by default. You’ll need to run this on every boot, unless you add it to the system startup scripts:

sudo modprobe snd_bcm2835

The sound driver is still under active development and considered alpha quality. It’s more stable than the previous release but CPU usage is still a bit high side, which may interfere with SimCoupe. Future Debian releases should include an updated driver.

To launch SimCoupe and boot the Manic Miner demo use:

./simcoupe ManicMinerDemo.zip

Binary Download

Here’s one I made earlier:

wget http://simcoupe.org/files/simcoupi-r1413.zip
unzip simcoupi-r1413.zip
./simcoupe

Known Issues

It sometimes takes 10 seconds to close the ALSA audio device. This delay may be experienced when quitting the emulator (Ctrl-F12), or after changing sound settings in the SimCoupe options (F10). Hopefully a future driver update will fix this issue.

ToDo

SimCoupe does not yet take full advantage of the Pi hardware. A future release will use OpenGL ES for hardware accelerated stretching and alpha blending. Using a 50Hz/PAL display mode and vsync should also allow perfectly smooth scrolling, with audio scaled slightly to match.

Spectrum Pac-Man

I think I’ve got Pac-Man back out of my system for now, with the new(ish) Spectrum port and updated SAM version.

The Spectrum version turned out to be much bigger than expected, in terms of both conversion effort and community reception. I’d only planned to do a quick conversion of the graphics to monochrome, and spend an evening or two rewriting the graphics routines for the display change. It did start that way, but snowballed from there.

The early work was done using pyz80+SimCoupe, with a mode 1 screen matching what the Spectrum would use. Once I got the basic tile drawing working (still only to 8-pixel boundaries), I switched to Pasmo+Fuse to check the AY sound mapping, and ensure the rest of the game was running correctly. I kept a video of this first playable version, which still lacked sprites.

The tile support includes the flashing power pills, which the arcade version animates by changing the cell attribute colour. The SAM version flashes a spare palette entry, used only for the power pill graphics. Unfortunately, the Spectrum couldn’t use attribute blocks without affecting the sprites passing over them, so the only option was to flash the display data directly.

Adding the sprites was more trouble than expected due to lack of free memory. The SAM version has 102 sprites, but at least 24 of the coloured ghost sprites weren’t needed, since they all looked the same in the Spectrum version. The remaining 78 sprites still required a whopping 21K to be stored fully pre-shifted. On top of that the 256 background tiles in 4 possible shift positions required an additional 10K. Ouch.

To save space I halved the resolution of the frequency-to-AY sound look-up table, and stored only the even sprite shift positions; the odd positions could be made up from those at draw time. Even that extra drawing work was too much at times, causing dropped frames if too many sprites were at odd positions, as they often were in one of the main vertical tunnels.

I really needed the full set of pre-shifted graphics, so I looked for savings in the graphics themselves. The tile set included a number of gaps, which could be filled by relocating other tiles. As with the sprites, the duplicate coloured ghosts (used for the attract screen) could also be removed. The fruit tiles weren’t needed either, since I used the sprite versions to simplify drawing of the relocated fruit to the right of the maze. On the sprites side, I eliminated duplicate segments from the large Pac-Man character, as used for the first intermission sequence. The savings worked, with a little space to spare.

Having all the ghosts look the same was a problem, as each has its own behaviour, and telling them apart is an important part of gameplay. I considered having a symbol stamped on each, but felt that would spoil the appearance. I chose to single out just the red ghost (the most dangerous) with a small mouth, so you could tell him apart from the others. It might even make it look a bit more menacing too!

At that point it was good enough for the first release. I got plenty of feedback and feature requests, one of which was colour support. However, the maze isn’t aligned to Spectrum attribute blocks, as that would require extensive changes to the graphics tile set and/or the ROM (thanks to Andrew Owen for looking into this). I still thought it was worth trying colour, if only to prove how bad it would look. Except it didn’t.

Colour support was added to the sprite save/restore/draw code, with a look-up table mapping sprite number to a single Spectrum attribute value. As a bonus, the lives and fruit indicators to the side of the maze were also in colour, as they were drawn using the sprite code. Unfortunately, the extra work to add colour pushed us back into the danger zone, causing frames to be dropped in some cases (mostly when the fruit sprite was visible). I released a video showing colour support in action, but took care to mask the speed problem by my choice of route through the maze. The video was a hit, so I needed to fix the running speed, fast!

The biggest time saving was a relatively simple one; rather than save and restore the previous attribute blocks for each sprite, I just needed to paint the old location with the current screen attribute. This, combined with other tweaks to the save/restore code was enough, and the colour version was ready for all. At this point it was still an assemble-time option to pick between mono and colour, but the next release added run-time switching, using a sprinkling of self-modifying code.

More recently, some of the Spectrum enhancements have found their way back to the SAM version, just in time for its 8th anniversary update. The save/restore/draw/clip code is more efficient, reducing the risk of frame overrun in later levels when the game speeds up. Adding the ROMs to the disk image is much easier, and the game startup is faster due to skipped memory check. It also adds joystick support, and our old favourite the Q/A/O/P key mappings.

Barring bugs, I’ll probably not return to this project for a while. That might even give time to look into the feasibility of Mr. Do!

ZXodus Engine

Andrew Owen recently released his ZXodus Engine for the Spectrum, which provides a 9×9 tile grid (144×144 pixels), with independent attribute control for each 8-pixel display byte. He seemed particularly chuffed it achieved a rainbow processing effect across 18 blocks, when most people stopped at 16.

While it was great he made it freely available, I have to admit I didn’t think the technical side was all that special. The LD/PUSH technique for inlining data had been used elsewhere, and there had been plenty of rainbow processors too. Was I missing something? The best way to find out was to attempt to write my own version. I ran the official ZXodus demo to see what it looked like, but avoided looking at the code so I wasn’t influenced in any way.

As with all raster-level effects on the Spectrum, it requires an interrupt mode 2 handler to give a consistent starting point at the beginning of the frame. To that you add a large (~15KT) delay loop to wait until the TV raster is at the required position to begin racing the beam. I used trial and error (and the debugger in MacFuse) to get me close enough to start work on the real code.

The simplest and fastest way to writing a block of 18 attribute bytes is:

     ld   sp,xxxx   ; 10
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
                    ; = 199T

That is comfortably below the 224T per scanline on a 48K Spectrum. However, that doesn’t include memory contention delays due by the ULA reading lower RAM when drawing the main display. Contention affects 128T of each scanline, leaving a 96T region (right border, retrace, left border) free of delays. The LD instructions and their immediate operands are in upper RAM, so they’re unaffected by contention. That just leaves the PUSH instructions to worry about, which take an additional ~5T in contended areas. If we position the code so the final 9 instructions are within the 96T region, only the first 3 PUSHes will be contended. That gives us a new total of ~214T, which is still below the scanline limit.

Another requirement for rainbow processors is that the raster must not catch us mid-draw, or you’ll see a mix of old and new data, spoiling the effect. This is made even more challenging by our use of the stack, which writes top-down; rather than trying to outrun the raster we’re running directly towards it! Our wider 18-block effect further reduces the time available for the drawing code, requiring us to complete it in just (224-18*4)=152T. Using our best-case contended timings from the code above the drawing code takes ~169T, which is too slow.

To fix this we need to cut the time between the first and last write, which means pre-loading more values into registers. AF is no use, and IX/IY are too slow, but the alternate set of main registers are perfect. It does require an extra 8T for two EXX instructions, but we still have enough time to spare.

Here’s the updated code:

     ld   sp,xxxx   ; 10
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     exx            ;  4
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11 (~16)
     push de        ; 11 (16)
     push hl        ; 11 (16)
     ld   hl,xxxx   ; 10
     ld   de,xxxx   ; 10
     ld   bc,xxxx   ; 10
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
     exx            ;  4
     push bc        ; 11
     push de        ; 11
     push hl        ; 11
                    ; = 214T (~222T)

This new code is just within the scanline limit, and the drawing time of ~143T is within the required 152T window. This confirms we can achieve the required width of 18 blocks, but there’s still the issue of the effect position. Keeping six of the PUSH instructions within the uncontended border region gives no control over the location of the first write, which ultimately determines the position of the right edge of the effect. If we slide the code any earlier or later we’re bitten by extra contention, which pushes (tee-hee) us over the scanline time limit. If we aim to have the final instruction finish just before the main screen on the next scanline, the first write is at scanline offset (224-143)=81T. That’s 20 columns into the contended area, and since the ULA reads ahead of drawing the each display block, that puts the start of the effect at column 1 on the display.

With any raster effect there’s also the issue with timing stability. Before servicing an interrupt the Z80 will finish the current instruction, which could be a modest 4T or a monster 23T. To keep the effect stable you need to build some padding into the effect timing, or ensure the last instruction before every interrupt has the same timing. Traditional rainbow effects have enough time to start early and finish late to mask the issue, but with 18 columns there’s literally no time to spare. Our only option for stability is to rely on a HALT before every interrupt; that’s relatively easy in a machine code program, but it’s difficult to avoid flicker in BASIC when you’re doing other things.

So, I now see 18-column rainbow effect is indeed something special (sorry Andrew!) It’s right at the very edge of what’s possible on a 48K Spectrum, with no time to spare. For the full effect you just need 144 repeated copies of the code above, starting from T=15900, and with the appropriate values inserted. No extra padding needed between lines as there’s no time to spare. The only change needed for a 128K version is to the start offset, with the extra 4T scanline time seemingly absorbed by contention alignment.

I’m told that Matt Westcott was first to discover that 18 columns was possible, but don’t know if it was ever used in a demo.

I won’t link my own code here as it’s very much a work in progress, but I’m happy to supply it on request. It may even become part of the official ZXodus code at some point, as it contains a number of enhancements.

Edit: Since it did become part of ZXodus II, here’s my original test program source code, as detailed above.

Space Invaders emulator

I thought it was about time I added the Space Invaders “emulator” (binary port?) to my website, as I’d not touched it in over 3 years. Most of the work to get it running was done, with just sound and display rotation left to add. While mulling over the tricky display code I moved on to other projects and it was pretty much forgotten about.

It’s still unfinished but I’ve cleaned up the code, prepared a bootable disk, and refreshed myself on the technical details. It was an interesting contrast to the Pac-Man project I’d worked on previously. As before, the challenge was to modify as little of the original ROM as possible, with a virgin copy of the ROM patched at runtime.

CPU

The Space Invaders arcade machine uses an Intel 8080 CPU running at just under 2MHz. The Z80 was released 2 years after the 8080 and was designed to be object-code compatible, so the Invaders code runs on SAM (almost) unmodified. The Z80 also added many new features, including: IX/IY index registers, alternate registers sets, multiple interrupt modes, CB/ED extended instruction sets, and the relative jump instructions JR [cc] and DJNZ.

The 8080 has a single interrupt mode equivalent to the Z80’s IM0, where an instruction is supplied on the bus at interrupt time. The Invaders hardware supplies both RST 08 and RST 10 instructions at a frequency of 60Hz, which drive the overall game logic, including the attract screen. SAM lacks the extra hardware, but they can both be simulated using IM2 and a line interrupt, without modifying the ROM.

I/O ports 1 to 6 are used for coin and button inputs, as well a hardware bit-shifter circuit. The shifter takes a 16-bit value (written to port 4 in low/high order), and a left-shift count (written to port 2). Reading from port 3 returns just the high byte of the result — more on this later.

As we’re running the ROM code natively, trapping the I/O requires patching the instructions that make the requests. The only I/O instructions supported by the 8080 are IN A,(n) and OUT (n),A, which include the port number as an immediate operand. This allows us to use a simple loop to find and patch instructions that access ports 1 to 6 (later checked manually to ensure no false-positive matches). Each occurrence is replaced by a RST 08 instruction, with the original operand modified to include a flag indicating whether the original instruction was IN or OUT. We could have used separate RST calls for each, but that requires duplicating the RST handler and modifying more of the original ROM.

Since we’re simulating the interrupt calls, we have control over how the original RST 08 and RST 10 handlers are invoked. The ROM code for both start with 4 register push instructions, which can be moved to our own interrupt handler, freeing the space for our I/O hook.

DISPLAY

Space Invaders uses a monochrome bitmapped display with a linear layout, similar to SAM’s mode 2. The display resolution is 224×256, but like most portrait arcade games the display hardware works in landscape mode. Fitting the 256×224 (rotated) area on SAM’s 256×192 screen means we lose 4 character columns from the width of the play area.

As with SAM’s mode 2 (and the Spectrum), drawing to a non-character aligned position requires bit shifting of data. Invaders uses this for more control over the vertical position of the invaders, as well as the smooth scrolling of player and invader bullets. The hardware shifting circuit makes easy work of this, which is a good thing considering the slow CPU speed! That said, the invader pack does only move one invader at a time, keeping the per-frame drawing to a minimum.

The Invaders display is stored at &2400-3fff, which isn’t compatible with the 16K boundary requirement for SAM’s mode 2. That means redirecting ALL display writes to a suitable upper memory location; something difficult to do from a centralised point in the code. About the only option is to identify ROM routines accessing the display and provide alternative implementations.

Copying the first 6K of Invaders display to a SAM mode 2 screen in upper memory confirmed the game was running, but revealed another issue — the bit order within display bytes was reversed compared to SAM, requiring each byte be flipped before writing. The byte rotation could be avoided by rotating the display in the opposite direction, but that would leave scanline rows in reverse order, requiring a much larger display mapping table to correct.

To map the display accesses to a SAM-compatible location we offset the high byte of the address. Subtracting an additional 2 from this value also pulls the display up (well, left!) by two columns, centralising the game area on the SAM display. This clips a character from each side of the title area, and half an invader at the left and right edges, but it’s only a small difference. The movement range for the player turret is more limited so it’s unaffected.

The game now looked great, but play-testing revealed some issues. When the invader pack reaches the edge of the display it’s supposed to lower and turn back, but that wasn’t happening. Also, player bullets were passing through the invaders without hitting them. It turned out that collision detection was done by checking the display contents, but it was still reading from the original display location. Hooking an extra couple of routines to look at the new display area soon fixed that.

A final change was to add a splash of colour to match the original machine. As the video hardware didn’t support colour, cellophane strips were added to areas of the monitor: green for lives, bases and player turret, red for the flying saucer at the top. An equivalent effect can be achieved in the SAM version using blocks of mode 2 attributes, which are unaffected by the display data writes.

Rotating the display to the normal SAM orientation remains a challenge. My original approach was to apply rotation and scaling to each display write, preserving the original layout. That meant scaling/masking/combining each byte, so the iconic graphics would suffer some scaling distortion. A better approach might be to relocate some areas of the display, as I did with the score and fruit areas in my Pac-Man emulator. It still requires rotation, but only within simple 8 pixel blocks. Writes from some hook reimplementations could also be optimised for full block writes.

SOUND

The sound effects in the original game are generated using analogue circuits rather than a sound chip, which makes them difficult to emulate in a traditional sense. Most Space Invaders emulators use sound samples taken from the original machine instead. I haven’t implemented the sound yet, but will attempt to create approximate effects with the SAM sound chip.

The source code and bootable disk image are now available on my website, but you’ll need to provide your own Space Invaders ROM image.

Further EDSK extensions

I’ve been involved with various disk preservation groups over the last few years. A large part of that has been for Spectrum +3 and Amstrad CPC disks, with SAMdisk extended to support copy-protected disks. The +3/CPC disks are usually stored in the Extended DSK (EDSK) image file format, designed to hold (almost) any format compatible with the uPD765 floppy controller.

Many problem disks have been reverse-engineered to discover why they didn’t work. A few required emulator enhancements to improve hardware accuracy, but most were missing details from the original disks, due to some creative floppy controller use by the copy protection checks. Not all of these could be supported by the original EDSK specification.

Back in October 2005, I suggested a few EDSK enhancements, designed to address some known limitations of the format. The extensions didn’t involve anything too radical, to maintain as much backwards compatibility as possible.

It’s now three years later, and a number of new gap-related CPC protections have been identified, which are beyond the scope of even the extended Extended DSK format! I’ve made further changes to address the new requirements, as well as a correction to a previous one.

My development version of SAMdisk includes support for all the new features, and will be released if the extensions are approved. Other programs will need similar enhancements to take advantage of them, particularly emulators wanting to run some of the difficult disks.

See the updated extensions document for further details. There are also sample disk images showing each extension.