Archive

Archive for the ‘6502 Assembly’ Category

Dynamic Memory Allocation Part 2

May 18th, 2007 Ant No comments

A bit of research yesterday turned up this site:

The Official Unofficial C= Hacking Homepage

Seems to be a collection of 15 year old docs about C64 programming. The second issue is most pertinent here, as it contains - hurrah! - a description of a dynamic memory allocation technique. It’s designed for the C128, and is intended to support system RAM, expansion RAM, and some other sort of memory the name of which I can’t remember, but all I need to know is how it works.

In brief, the technique works like this (modified slightly to remove the RAM expansion stuff). We create a linked list of free memory blocks, each with a 4-byte header. The header contains the 16-bit (2 byte) address of the next free memory block, and the 16-bit (2 byte) length of the current free memory block. When the routine starts, we have a single header followed by lots of empty RAM. To allocate memory:

  • Jump to the first memory block’s header
  • Check the length of the block - is it large enough to contain our data?
  • If yes:
    • Jump x places from the end of the memory block (where x is the length of our data)
    • Insert our data into that position and return a pointer to the start of our data
    • Reduce the length of the memory block (the header value) by x
  • If no:
    • Jump to the next address (stored as the first two bytes in the block’s header), which moves us to the next free memory block
    • Repeat until memory allocated

The original article suggests that each chunk of memory should be aligned to the next 8-byte boundary; this would make each chunk a multiple of 8-bytes long. I’d have to think about that some more before I decided that it was definitely the block size to use.

When we de-allocate memory, we try to join the freed block up to either a chunk immediately to the left or right (or both), so that we don’t have chunks of contiguous free memory separated by useless headers.

Compared with my FAT system, this has pros and cons. On the positive side, huge chunks of memory can be allocated with very little waste. We could allocate a block of 16K and use only 4 bytes of memory to manage that block, whereas the FAT system would need 2K to manage the block. It’s probably faster than the FAT system, too. On the negative side, allocating a single byte would eat up at least 5 bytes of RAM, which is incredibly wasteful.

This got me thinking. What if you could group variables together and allocate them all in one go? If you were writing Pong, for example, you’d know that each player has an X value (1 byte), a Y value (1 byte), a velocity (1 signed byte) and a score (1 byte stored as a binary coded decimal). That’s 4 bytes. Allocating them individually in the linked list system would use 20 bytes (more if we align along 8-byte boundaries). However, if we decided that we’d structure it so that we knew the sequence of bytes, we could allocate it as one chunk, then use indexed addressing to get at the correct value. To get at the X value, we just use:

LDA POINTER

To get at the Y value:

LDA POINTER+1

Etc. We can store this in just 8 bytes - 4 header bytes plus 4 data bytes. I think I must be on the right track again - I’ve just re-invented the struct.

Categories: 6502 Assembly Tags:

Dynamic Memory Allocation, or Reinventing the Wheel

May 17th, 2007 Ant No comments

In my quest to learn an obsolete flavour of an ancient programming language, the biggest question I’ve found myself asking is what to do if you need to manage memory dynamically. Static memory allocation is easy enough. If I’m writing a game, I know how many enemies will be on-screen, how many players there are, how many levels; in short, I can manually map out the memory usage of the game and leave placeholders in my assembly code to fill with that data as the program is running.

It gets more complicated when you’re trying to write an application whose sole purpose is to store increasing amounts of data, such as a spreadsheet, word processor, paint package, etc. If you need to be able to dynamically allocate and free memory as it is used, how do you go about doing it?

So, I got to pondering. My first thought was that you could decide upon a large block of memory to use as dynamic storage (say the addresses $6000-$9FFF on the C64, which is at the end of the BASIC storage space). You specify that another memory location is used as a lookup table for data in that memory block. We’d use each bit in the lookup table to specify whether or not a whole byte of the memory block is free. So, say our lookup table starts at $4000, and the first value is $FF, that means that the first 8 bytes of our memory block are in use. Allocating memory is a question of scanning along the lookup until we find a clear bit; we then set the bit and use the byte in memory. If we’re trying to allocate more than a single byte, we scan along until we find enough clear bits (ie. empty bytes) to contain our data. When we free memory, we simply clear the relevant bits.

This has two major problems. First of all, our memory block uses up just under 16K of RAM, which means our lookup table uses 2K of RAM. When you’ve only got 64K to start with, and when much of that is taken up with the operating system (simple though it may be), you don’t really want to waste that much memory. One solution would be to divide the memory into chunks of 2 bytes and just remember whether we’re using a chunk or not. That would reduce the size of our lookup table by half, but it’d double the storage space needed for a single byte to two bytes. We’re not doubling the size of a byte, but we’ve reduced the resolution of our lookup to prevent us from seeing that we’ve only used one of the bytes in that 16-bit block. This solution becomes troublesome if we’re storing predominantly 8-bit data instead of 16-bit data, because we’ve effectively halved the available memory.

It occurred to me that this table is, essentially, nothing more than a FAT system implemented in RAM instead of on disk. The “chunks” are, in FAT terminology, “blocks”. Windows uses something like a 4K block size, so if you save a 2K file you end up with 4K used on the disk. The Amiga had variable block sizes, defaulting to a small size which made it great at storing small files efficiently, but hopeless at retrieving large files. First re-invention of the day! Anyway, let’s call this lookup table the “FAT”.

The second problem is that we’ll quickly end up with fragmented memory, with empty bits interspersing the used data, which will greatly increase the amount of work done by the memory allocation routine.

It’s obvious that we need some sort of defragmentation routine here. The simplest way to do this is to write a subroutine that will scan along the FAT until it hits a clear bit. It then scans along until it finds a set bit, and moves all of the back to remove the gap. It manipulates both the FAT and the data to ensure they both stay in sync.

This is, basically, what Tetris does when the game detects a full row - remove the full row, then move the rows above it down to fill the gap. And they say you can’t learn anything from videogames!

The problem with this is that any existing pointers will be thrown out - they’ll now point to the wrong location as all of the data has moved. To solve this, we need a second lookup table. The new lookup table will store pointers to the main data block, and the main program uses these pointers to refer to data (in C terminology, we double-dereference the pointer, or in asm terms we treat the address as an indirect address).

So, when we allocate memory, we store a pointer to the data in the pointer table, and pass the pointer table address back to the program. To access the data, the program would need to go to the pointer table address, go to the memory address stored there, and finally get at the desired data.

Now, there is a huge problem with this. First of all, the obvious way of storing this (one address for every byte in the memory block) would use twice as much memory as the memory block itself. The memory block is 16K in size, which is 16384 bytes of information. The pointer table would, therefore, need to store 16384 addresses, and as each address is a 16-bit, or 2 byte, value, the pointer table would use 32K of RAM. That’s half of the memory in the entire computer.

At this point I realise that there’s already a function in existence to do all of this - the malloc() function in C must do something similar. So, I look it up. It transpires that one of the more popular ways to manage memory dynamically is to produce a memory map, or “mmap”, which works in a very similar way to the FAT system outlined above (it may even be identical; I’d have to read some more to say definitively). It seems that memory fragmentation is one of the major problems with this method. The guy who wrote the Enlightenment window manager for Linux came up with a system almost identical to my pointer system above to solve this.

So, that’s twice more I’ve re-invented the wheel today. Still, it’s nice to know that I’m working on the right lines.

I’ve done some more thinking, and have a few more ideas for auto-defragmenting dynamic memory allocation. The main problem with all of them is that they’re fairly complex and therefore aren’t really suited to the C64. Still, the fragmented memory map system would be easy to implement. I might try that as my first useful 6502 project.

Categories: 6502 Assembly Tags:

Some C64 Observations

May 15th, 2007 Ant Comments off

Reading up on the C64. Interesting things learned so far:

  • The memory map of the C64 leaves the assembler programmer with 3 bytes of zero page memory to use. I’m sure there are other addresses that could be abused on the assumption that, for example, the RS-232 port won’t be used during a game. But - 3 bytes?!
  • BASIC code begins at $0800 and ends at $9FFF, which explains why the C64 code I’ve seen so far positions the code at $0800.
  • The C64 Programmer’s Reference Guide looks like it contains everything I need to get started (except for the C64 asm header code, and some way of producing a file that’ll run in an emulator).
  • The C64’s 6510 CPU lets you bank in and out extra RAM if you don’t need the BASIC ROM or the character set, just by setting a bit at a particular address. Clever!
  • The C64 has a set of built-in functions that control all of the I/O you’d need to perform. For example, printing to the screen is just a matter of loading a hex value representing an ASCII code into the accumulator, then jumping to the memory address for the print subroutine. The memory address is in something called a “jump table” (a list of pointers to the subroutines themselves, I’d imagine) near the end of the 64K address space. The jump table apparently forms the C64’s kernal.

I’ve also noticed that “Introduction to Assembly Language for the Commodore 64” by Stan Krute is selling for $194.94 (roughly £100 if you use the official exchange rate, or £194.94 if you use the Apple exchange rate) on the US Amazon site. As far as I can tell, it is second-hand, and it’s not made of gold. Those crazy Amazon guys!

Categories: 6502 Assembly Tags:

In Another Assembler…

May 13th, 2007 Ant No comments

Here’s the same thing designed to compile with DASM:

        PROCESSOR 6502
        .ORG $0200      ;Locate asm at $0200

CODE    LDA #<LIST      ;Load low byte of LIST into acc
        STA $30         ;Store low byte at address $30
        LDA #>LIST      ;Load high byte of LIST into acc
        STA $31         ;Store high byte at address $31
        JSR LOAD        ;Jump to LOAD subroutine
        BRK             ;Stop running

LOAD    LDY #$1         ;Load 1 into Y register
        LDA ($30),Y     ;Load data pointed to by address $30+1 into accumulator
        RTS             ;Return from subroutine

LIST    DC.B #$20,#$04  ;Define a list of bytes

Great, so I can now compile 6502 programs from an OSX terminal instead of loading up Parallels. The only problem is I can’t run them in OSX. Gahh.

Categories: 6502 Assembly Tags:

Progress So Far

May 13th, 2007 Ant No comments

Things I’ve learnt about the 6502 so far:

  • Memory in the range $0000-$00FF is called “zero page” memory. This can be accessed much faster than any other memory because the addresses are all 8-bit. This means that the 8-bit CPU can process the address in one go, instead of needing to process the second byte in a 16-bit address.
  • Memory in the range $0100-$01FF is used as the stack (this is why most of the programs I’ve seen so far locate themselves at address $0200 - it puts them past both the valuable zero page RAM and the unpredictable system stack).
  • Memory in the range $FFFA-$FFFF is used by the three interrupt commands.
  • The CPU supports several different addressing types (about 13, I think), all of which are far too tedious to detail here.

One thing I haven’t found out is how to get the memory address of a label. Say we have to following code:

        .ORG $0200      ;Locate asm at $0200

CODE    JSR LOAD        ;Jump to LOAD subroutine
        BRK             ;Stop running

LOAD    LDY #$1         ;Load 1 into Y register
        LDA ($30),Y     ;Load data in 1st byte after address in $30 into acc
        RTS             ;Return from subroutine

LIST    .DB #$20        ;Define a list of bytes
        .DB #$04

At the label “CODE”, before the subroutine jump, I want to load the address of the list into memory address $30. That way, my “LOAD” subroutine is completely generic. I can define as many lists as I like (each with different labels, natch), and I can call the “LOAD” subroutine on any of them as long as I load the list’s start address into $30. I’d expect to be able to do this with the commands:

         LDA LIST       ;Load the address of list into the accumulator
         STA $30        ;Store accumulator into address $30
         LDA LIST+1     ;Load other byte of address into acc
         STA $31        ;Store accumulator into address $31

The bytes are probably the wrong way around as the little-endian-ness hasn’t quite sunk in yet. Anyway, that doesn’t work - what happens here is that LIST is treated as an absolute memory address (a pointer), and the first LDA command loads the first list item into the accumulator (in C terms, it automatically dereferences the pointer and gives me the data). There doesn’t seem to be any way to get at the actual address of the list. So how am I going to make my generic list subroutine? I have no idea. I imagine it involves more reading.

(A few minutes later)

Oh, no - it’s actually quite easy. At least, it is in the “6502 Simulator” Windows program I’m using to code with. The code should look like this:

         LDA #<LIST     ;Load low byte of LIST into acc
         STA $30        ;Store low byte at address $30
         LDA #>LIST     ;Load high byte of LIST into acc
         STA $31        ;Store high byte at address $31

Prefixing a label with a hash turns it into an immediate number instead of an absolute memory address (ie. we can get the memory address instead of treating it as a pointer and retrieving the data pointed to by the label). Using the greater than/less than symbols allows us to extract individual bytes from the 16-bit address.

Categories: 6502 Assembly Tags:

Argh! My Brain!

May 13th, 2007 Ant No comments

From “6502 Software Design”, by Leo J. Scanlon, 1981:

Zero page indexed addressing is to zero page addressing as absolute indexed addressing is to absolute addressing. With zero page indexed addressing, the effective zero page address of the operand is computed by adding the contents of the X or Y register to the zero page base address contained in the second byte of the instruction.

That’s a sentence that George Eliot herself would have been proud of. It’s a good job I already understand this stuff…

Categories: 6502 Assembly Tags:

What are the Chances?

May 12th, 2007 Ant No comments

No 6502 assembly books in Hay-on-Wye. That’ll make learning it so much more difficult. Except, in a freaky coincidence, I went to a car boot in Stratford today and came across a stall selling a whole box of BBC-B programming books, including 4 6502 books. 50p each! What are the chances of that happening?

Categories: 6502 Assembly Tags:

Delayed by Wozniak

May 11th, 2007 Ant No comments

Noticed that the GBAX competition is on again. Got until July, so I might try to get Defender DS finished in time for that.

I’ve been temporarily distracted by 6502 assembly, though. I went on a hiking/camping trip to Brecon, and ended up spending one of the more rainy days wandering around Hay-on-Wye’s bookshops. Last time I went (I must have been about 14) I remember nearly buying a book on m68k assembly - in the end I didn’t bother because it suffered from the usual problem these books have:

The first chapter explains how to switch a computer on, whilst the second launches you into the most efficient way to swap two 32-bit integers using just one 16-bit register and a single memory address. So what happened to the chapter in between explaining all of this jargon?

This time I thought that, in all of those shops, I must be able to find something interesting.

As it turned out, there weren’t any interesting computery books. There were a few books on BBC-B programming, but who’d want to program one of those? I did manage to find a copy of iWoz, Steve Wozniak’s autobiography, for the bargain price of £2.95 (£10 off!). As I remembered wanting to buy it when I first heard about it I figured that I couldn’t turn it down.

The book turned out to be fascinating and entirely inspirational, so I heartily recommend that anyone interested in either the history of home computing or practical jokes gets hold of it. Woz goes into the design of the first two Apple computers in a reasonable amount of detail - it’s understandable for the kind of people who’d go out and buy a book about the history of Apple (the nutters). Part of that involves a discussion of the 6502 CPU, and that got me interested in it.

The 6502, compared with something like the 68k or even the Z80, is an incredibly simple chip. It’s completely 8-bit (none of this 16/32-bit hybridisation like the 68k, or 16-bit registers like the Z80). It’s got a total of three registers, can address 64K of RAM, and has roughly 50 documented instructions. It looks like a really good starting point for getting the hang of assembly language, or more specifically, how to translate high-level concepts like linked lists, structs, pointers, etc, into assembly. As a side effect, learning 6502 asm will put me in the position to code for the C64, Atari VCS, NES, SNES, BBC-B, Apple 1, and (I think) the Apple 2, Dragon and Tandy CoCo.

So, which one to concentrate on? The C64 looks to be simple enough. The VCS is weird - you have to slot your code into the space between horizontal scans of the TV’s electron gun, then process the HSYNC, then start again on the next scanline. Yikes. It’s also fantastically limited - it only gives you half a dozen movable objects to play with (that are explicitly designed to be a player graphic, a ball graphic or a missile graphic - spot the VCS’ heritage as a home Pong machine!), and backgrounds are even stranger. The VCS will draw the left side of the background, and then for the right side either draw the left side again, or flip the left side horizontally and draw it as a mirror image.

This raises the obvious question - how did they manage to achieve so many different games with such limited hardware? It also explains why the asteroid count doesn’t increase in VCS Asteroids…

The NES undoubtedly has daft graphics hardware, and the mapper business is off-putting. The SNES uses an 8/16-bit hybrid CPU based on the 6502, so that’d probably mean learning a modified version of 6502 asm. I’m not particularly interested in coding for the others.

So, C64 asm it is. As a wee bairn I had a Speccy, so this is a bit sacrilegious, but I’m sure the techniques I learn will translate to Z80 assembly.

The first such trick I’ve learned is how to find the integer root of a number. I remember the Newton-Raphson method from A-Level maths, but that’s an incredibly inefficient way of doing things, particularly if you’re not interested in the fractional part of the root. I’ve always wondered how you’d go about finding a square root more efficiently, and now I know.

Basically, take the number you want to find the root of, and keep subtracting sequential odd numbers from it until the next subtraction will result in 0 or a negative value. The number of iterations required is the integer square root. So, to find the square root of 25:

#___Value___-
1_____25____1
2_____24____3
3_____21____5
4_____16____7
5_____9_____9
6_____0_____11

The first column is the iteration number (excuse the formatting; WordPress doesn’t want me to show you this for some reason). The next column is the current value, and the last column is the odd number we’re subtracting. Start with a value of 25 and subtract the first odd number (1) and we get 24, which we use in the second iteration. Eventually we reach iteration 6 which gives us a value of 0. Since we don’t want to reach 0 or a negative number, this means that the square root is the last “good” iteration, number 5.

5 is the square root of 25.

Categories: 6502 Assembly Tags: