Archive

Archive for the ‘Algorithms’ Category

A Better UTF-8 String

January 7th, 2010 Ant 4 comments

The WoopsiString Class - Optimised for ASCII

The current WoopsiString class stores its data in the usual way: an array of chars. When the string is initially set, the object creates an array large enough to hold the data it should store. It then copies the data into that new array.

When appending to the string, the object checks to see if it contains enough free space within the existing array to contain the new data. If so, the data is just copied into the free space. If not, a new array is allocated large enough to contain both the existing data and the new data. The existing data is copied into it, and the new data is copied to the end.

Insertion works in a similar way, except two portions of the existing data will be copied separately into disjointed sections of the new array if the insertion point falls within the existing string.

In contrast, removal is very simple. A new terminator character is placed at the new endpoint of the string, and everything following it is deemed to be free space, ready for future insert/append operations.

Although it involves a lot of copying when inserting and appending, this system works exceptionally well for the two most frequent string operations - random access to a character within a string, and iterating sequentially over the characters in a string. Keeping all of the characters in a sequential block of RAM makes these two operations trivial.

Reducing Data Copying Via Linked Lists

When I was writing the WoopsiString class, however, the amount of copying annoyed me. I came up with a solution that performed a lot less copying by breaking the string data up into a linked list of string fragments. For example, imagine that we have this string:

hello

Drawn with pipe characters as byte boundaries, it looks like this:

| h | e | l | l | o | \0 |

Suppose we want to append the text ” world!” onto that string. The current WoopsiString class would do this:

  1. Allocate a new char array of length 13.
  2. Copy “hello” from existing array into new array.
  3. Copy ” world!” from second array into new array.
  4. Append terminator to end of new array.
  5. Delete existing array.
  6. Update array pointer to point to new array.

We end up with this arrangement:

| h | e | l | l | o |   | w | o | r | l | d | ! | \0 |

The proposed new WoopsiString class would avoid all of the tedious copying by appending the ” world!” text as a new string fragment:

  1. Allocate a new linked list item containing a char array of length 7.
  2. Copy ” world!” into the new array.
  3. Point the “next” pointer of the “hello” fragment to the new ” world!” fragment.

The data structure we end up with looks like this (“—>” indicates a pointer):

| h | e | l | l | o | --> |   | w | o | r | l | d | ! |"

This new system has clear advantages over the current class, at least when it comes to appending. Very little copying needs to be performed.

Inserting has similar benefits. If we want to insert a comma after the word “hello” we would need to perform the following operations in the current class:

  1. Allocate a new char array of length 14.
  2. Copy “hello” from existing array into new array.
  3. Copy “,” from second array into new array after the word “hello”.
  4. Copy ” world!” from existing array into new array.
  5. Append terminator to end of new array.
  6. Delete existing array.
  7. Update array pointer to point to new array.

We end up with this:

| h | e | l | l | o | , |   | w | o | r | l | d | ! | \0 |

Again, lots of copying. The new class would do this instead:

  1. Allocate a new linked list item containing a char array of length 1.
  2. Copy “,” into the new array.
  3. Point the “next” pointer of the “,” fragment to the ” world!” fragment.
  4. Point the “next” pointer of the “hello” fragment to the “,” fragment.

We get this:

| h | e | l | l | o | --> | , | --> |   | w | o | r | l | d | ! |"

No copying involved. However, we are cheating a little here; we’re taking advantage of the fact that we’re inserting between two existing fragments. What happens if we try to insert “llo he” into our “hello, world!” string just after the second character?

  1. Allocate a new linked list item containing a char array of length 2.
  2. Copy “he” from the “hello” fragment into the new array.
  3. Allocate a new linked list item containing a char array of length 6.
  4. Copy “llo he” from the second array into the new array.
  5. Allocate a new linked list item containing a char array of length 3.
  6. Copy “llo” from the “hello” fragment into the new array.
  7. Point the “next” pointer of the “he” fragment to the “llo he” fragment.
  8. Point the “next” pointer of the “llo he” fragment to the “llo” fragment.
  9. Point the “next” pointer of the “llo” fragment to the “,” fragment.
  10. Delete the “hello” fragment.

The final data structure looks like this:

| h | e | -- > | l | l | o |   | h | e | --> l | l | o | --> | , | -->
|   | w | o | r | l | d | ! |"

(We could optimise that slightly by making use of the existing “hello” fragment instead of deleting it, which would ultimately give us “hello”—>” hello”->”,”—>” world!” instead. It makes the algorithm slightly more complex but slightly more efficient.)

The existing class would need to make 1 memory allocation, copy 19 characters and perform 1 memory free to be able to perform this operation. The new class would make 3 memory allocations, copy 11 characters and perform 1 memory free to do the same thing. Though this doesn’t look like much of an improvement, imagine that the string we’re inserting into is 5,000 characters long, and has been created by concatenating five 1,000 character strings together. The existing class would copy all 5,000 characters during the insert. The proposed new class would copy just 1,000, as it could work with an individual fragment instead of the entire string.

Problems with Random Access and Iteration

If this is such a great improvement, then, why didn’t I implement it? The answer lies right at the top of this post. The two most frequent operations performed on strings are extracting substrings and iterating over their characters. If a string is stored non-contiguously, these two operations become much more complex. In a sequential string, extracting a character from a string can be done in O(1) time, and is as simple as getting a pointer to the start of the string and adding an offset. Iterating over the string is simply a matter of adding 1 to the pointer.

In the proposed class, extracting a character is far more laborious. Assuming each fragment includes an int that identifies its length, the algorithm for finding a character looks like this:

Store a pointer to the head of the list in "listPointer"
Store its length in a temporary variable "totalLength"
While the desired index is greater "totalLength"
    Point "listPointer" to its own "next" pointer
    Add the length of the "listPointer" fragment to "totalLength"

Retrieve the character from the "listPointer" char array at desired index
minus "totalLength"

The worst-case scenario, in which the desired index is the last character in a string that has been built by appending single characters, will result in a retrieval time of O(n).

For ASCII strings, this is disastrous.

When dealing with UTF-8 strings, however, this approach starts to look far more appealing. In the best-case scenario for retrieving a character from a UTF-8 string the time is O(n), because there is no way of knowing where a character is stored without iterating over the entire string.

In Defense of UTF-8

To explain this, here’s a quick description of the problem that UTF-8 tries to solve. In ASCII text, each character is represented by a single byte. The most significant bit is not used, so each byte can represent one of 128 different characters. This is great for the English-speaking world, as ASCII text can represent all characters from the standard English alphabet, all numerals, and most of the important symbols (though us Brits are screwed when trying to talk about money, as the symbol for the British pound is not in the ASCII set).

The rest of the world is not so lucky. The ASCII set does not include accent marks, so the French are stuck. It doesn’t include umlauts or the Eszett (double ‘s’) so the Germans are stuck. And it definitely doesn’t include the tens of thousands of characters in any of the multiple Chinese languages, so they’re really stuck.

The solution is obviously to make characters longer than a single byte. Suppose we make all characters 32-bits wide, for example. Now we have another problem - all of our existing applications, that assume characters are 8-bits wide, no longer work. Plus we’ve quadrupled the size of all strings in all future applications. If those applications are for American users, 3/4ths of that is wasted space.

UTF-8 solves this by using multiple bytes for characters if necessary. It will represent ASCII characters as single bytes, but other characters will be represented as either 2, 3 or 4-byte sequences.

The problem with variable-length encodings is that it is impossible to know where a particular character is. For example, suppose we decide that the letter “w” should be represented by 4 bytes. “Hello world” now looks like this:

| H | e | l | l | o |   | w | w | w | w | o | r | l | d | \0 |

In order to find the first “r” in the string (character 8 ) we need to retrieve byte 11. Since all of the characters in the string could be multibyte characters, the only way we can possibly guarantee that we retrieve the correct character is by starting at “H” and working forwards, a character at a time.

Optimising UTF-8 with Linked Lists

This is where the new WoopsiString idea becomes extraordinarily helpful. Suppose that the string above has been created by concatenating the strings “Hello ” and “world”:

| H | e | l | l | o |   | --> | w | w | w | w | o | r | l | d |

Locating the “r” in the string is now considerably faster. We can bypass the entire “Hello” portion of the string simply by comparing the index we’re looking for (8) with the length of the “Hello” fragment (5), which tells us that the character we’re looking for isn’t contained in that fragment. For ASCII strings, this was a horrible performance decrease, but for UTF-8 strings represents a considerable performance increase.

If strings were automatically divided into fragments (32 characters - not bytes - wide, for example), and if the fragments included “previous” as well as “next” pointers (ie. a doubly-linked list), random access performance can be improved even further.

Iterating with Linked Lists - The StringIterator Class

As always, there is a penalty to pay for this performance increase. Iterating over a UTF-8 string is more complex than iterating over an ASCII string, but not much more complex. Once the starting character has been located, iterating over the string is simply a question of consuming bytes until a valid codepoint (the “value” of the character, sans the UTF-8 multibyte markers) has been identified, returning the codepoint, moving to the next byte and starting the process again.

If linked lists are introduced into the mix, finding the starting character becomes much faster, but iterating over the string from that point (assuming the string section being iterated over crosses a fragment boundary) becomes more complicated.

At the moment, Woopsi handles iteration over strings by extracting a char pointer from the WoopsiString object, then iterates over it by adding the size of each character to the pointer (the same algorithm as outlined above). If linked lists were introduced, this would not be possible. In fact, the only practical and sturdy solution I can some up with is to have a separate StringIterator class which would handle all string iteration. You’d need to give it a starting character to locate, then use its “MoveNext()” and “MovePrevious()” methods to iterate over the string.

Summary

In summary, if it’s possible to sum up these disparate thoughts, it’s possible to create a string class that uses linked lists in order to require far fewer copy operations than the existing WoopsiString class for standard string functions. Unfortunately, for ASCII text, it would be counterproductive to do so. Conversely, the same design presents a significant performance improvement for UTF-8 text, at the cost of making iteration more complex and time consuming.

Dates and Text Wrapping

October 28th, 2008 Ant No comments

Couple of changes today. The Date class introduced yesterday now has a more optimal “calculateWeekDay()” method, thanks to Jeff. It still seems to work correctly with the calendar, which is curious, because I’m sure it was previously returning “0” as the value for Sunday whereas it now returns “7”. I must be doing some modulus stuff in the Calendar class.

Secondly, the “wrap()” function in the Text class should now be rather faster than before. Instead of re-wrapping the entire string every time something is inserted, appended or removed from the string, it now locates the line of text that contains the modification and re-wraps from that point forward.

I’d tried to implement this a few days ago but came across a couple of problems. First of all, locating the line of text in which the modification fell was slightly tricky. I’d decided to use a binary search (obvious enough), but came unstuck because I was looking for a value that fell within the range indicated by two values in the wrapping data array. I wasn’t searching for an exact match, but a match between two values. Turned out to be fairly easy to implement once I’d scratched it down on paper.

Secondly, the wrap() function remembers the width (in pixels) of the widest line of text to help with other routines elsewhere in the class. If we try to wrap the text from line n, ignoring the previous lines, and the widest line is actually n-1, we won’t have the correct width. We’ll just have the width of the widest line in the range from n to the last line of text (if we ignore the old value we stored) or the width of a line that may no longer exist (if we use the old value).

The solution to that was pretty straightforward, too. I added a second vector to store the widths of every new widest line that the wrap routine comes across, along with the index of that line in the wrapping data vector. So, if the first line is 10px wide, that’s the widest line we’ve seen up until that point, so we add width 10 and index 0 to the vector. If the second line is 8px wide, we ignore it as it is thinner than the currently-identified widest line. If the third line is 20px wide, we add width 20 and index 2 to the vector. That way, when we re-wrap from line n, we discard all of the longest line data from the vector from line n onwards. The last entry in the width vector is then the width of the longest line in the text we’re not re-wrapping.

The upshot of all this is that if a char is appended to a string 100 lines long, only the last line is re-wrapped. Previously, all 100 lines would be re-wrapped.

Categories: Algorithms, Woopsi Tags: , , ,

Splitting Rectangles vs BSP Trees

May 6th, 2008 Ant 5 comments

Back when I started work on Woopsi in September, one of the first problems I had to solve was how to work out which parts of which gadgets were visible. That’s the secret to a good windowing system - make sure you only draw the parts of each gadget that are actually visible, and don’t waste CPU time drawing things that will get overdrawn later.

My solution to this was to write an algorithm that would take two rectangular surfaces, work out which parts of the first rectangle were overlapped by parts of the second, and finally spit out a pair of vectors that contained an array of the overlapped rectangles and the visible rectangles. Based on the timestamps of the blog posts in question, it took about three days to go from basic idea (which took a week or so to develop) to implementation, and the code has been tweaked considerably since then.

I bring this up because I was thinking of optimising the routine last night. My only real idea for this up until now was somehow to merge adjacent rectangles back together if they share common dimensions along one edge. The routine to do this would probably be just as complicated as the routine to split them up in the first place, so I haven’t bothered really looking into it.

Anyway, I came up with the idea that perhaps the merging routine could be simplified if I worked in a tree-like structure instead of just flat vectors. Every time a gadget is split up, the original rectangle is kept in memory (instead of being discarded) and the split up components are set as children of the original rectangle. Something of a revelation later and I realise that I finally understand BSP trees. Not only that, I’ve just re-invented them.

The splitting algorithm currently works like this. Imagine we want a screen to redraw itself. It creates two vectors, one that will store overlapped (and thus invisible) regions, and one that will store non-overlapped (visible) regions. It inserts its own rectangular region into the non-overlapped vector and sends both to its top-most child.

The top-most child works out where the “split points” of each rectangle in the non-overlapped vector are based on its own boundaries. In this example, in which we’re splitting the screen’s rectangle, the split points are illustrated by the red markers. The algorithm has to examine each dimension separately.

Woopsi Splitting Rects 1

Once it has this information, the algorithm has enough data to split up the original rectangle. This will give us this situation:

Woopsi Splitting Step 2

Splitting continues until we run out of child gadgets to split against or the non-overlapped vector is empty. We eventually get to this situation:

Woopsi Splitting Step 3

This shows the final stage, in which everything has been split. (Actually, I think it would be considerably more complicated than this, but you get the idea.)

This works, but it’s not the best way to do it, which brings us back to BSP trees. A BSP tree, or a “binary space partitioning tree”, is a data structure that models a complex polygon as a set of primitives by recursively splitting it into increasingly simpler shapes. In the world of Woopsi, what that means is instead of trying to explode rectangles into 9 smaller rectangles in one go, we divide them in two, then in two again, and in two again, storing each new rectangle as a child of the original rectangle, until we produce a more manageable shape.

Here’s how the split routine looks if we model it with a BSP tree instead. First of all, we work out where to split the screen’s rectangle by looking at the top-most child. Same idea as before, really, but we just divide the rectangle into two at the left edge of the child window:

Woopsi BSP Tree Step 1

These two new rectangles become the left and right children of the original rectangle, which gives us a tiny binary tree. We recurse into the left-hand child trying to split again, working our way down the child list until we hit something that overlaps this rectangle. Eventually we reach this:

Woopsi BSP Tree Step 2

At this point there are no more children to recurse through, so we exit the recursion stack and start on the right-hand child of the original rectangle:

Woopsi BSP Tree Step 3

There are more steps after this, but you can see where I’m going. The data structure we’ve created looks like this:

Woopsi BSP Tree Model

The advantage of this approach is huge:

  • We split less, so we don’t need to merge rectangles as it’s already done for us
  • The algorithm is considerably simpler than the original
  • It uses less memory
  • In theory, it’s faster
  • Gadgets no longer need to maintain vectors of cached visible rectangles
  • The layout of the display can be managed centrally
  • Gadget erasing code should be less cumbersome
  • Since splits are done in a strict top-down method, clipping to parents is automatically thrown in for free (this is currently bolted on)

I’ve got an implementation of the BSP drawing system done. It’s just over 200 lines of code for a tidy stand-alone class, compared with over 200 lines of code just for the “splitRectangles()” function embedded in the Gadget class in the current system. The only problem I’ve got is that the current system is so entrenched in the code it’s going to be a real pain to extract it. On the positive side, none of the current code should be affected, since this stuff is all hidden away deep in Woopsi’s guts.

I’m planning to get a very stripped-down version of Woopsi running in SDL, make the changes, get it running on the DS and then see if swapping to BSP trees really does make any difference.

Categories: Algorithms, Woopsi Tags:

The Long and Winding Post

March 31st, 2008 Ant 8 comments

In which I have diverse, barely connected ideas and fail to create a structured narrative that ties them together.

Support Classes

Woopsi includes two of the most ubiquitous container classes, dynamic arrays and linked lists, as part of the main library code. Of these, only the dynamic array is used. Linked lists are in there partly for other peoples’ use, but mainly because I wanted to write a linked list template class to complement the dynamic array.

There’s no reason why the linked list class and its associated iterator class should be shipped as part of the Woopsi library code. It would be better if Woopsi came with a separate directory of optional support classes that could be added into applications as required. The linked list would be one such class, and I could add the SimpleScreen and SimpleWindow back into the distribution too (updated to support everything in the current gadget set).

Coding for Noobs

I didn’t explain the rationale behind removing the Simple classes and it is appropriate to mention it here. A few weeks ago I was pondering on how Woopsi’s target audience had completely changed since I began working on it. The original plan was to create a GUI library that would be used in the same way that a web developer works with HTML. You just work with what you’re given and, should you really feel the urge to create new UI components, you do it all yourself from scratch. You can’t subclass an HTML button.

I set out to make a library that included enough UI components to allow developers to cobble together interfaces quickly and, more importantly, easily. Even beginners should be able to assemble GUIs using a tiny amount of code. This is where the SimpleWindow and SimpleScreen classes came in - they wrapped up child gadget creation in tidy methods and reduced the amount of code needed to get a UI working. They also protected the child gadget arrays from external interference and actively prevented subclassing. Subclass prevention was prevalent in the early Woopsi code.

Over time, Woopsi has become less focused on providing a system for newbies and more focused on being a useful tool for a whole range of programmers. As such, it struck me that coding for beginners is counter-productive. Microsoft don’t dumb down their APIs so that beginners can code for them; they write what they need to and layer simple, optional abstractions on top.

Hashmaps

I bring all of this up because I’ve been looking into the other main container class - the map/hashmap/hashtable/etc. I’ve never written one before and, although I use them all of the time, I didn’t have a clue how they worked.

Turns out to be fairly easy. A hashmap is essentially an array. Data is added to the hashmap in key/value pairs. When an attempt is made to add data to the hashmap, the key is turned into a “hash”, or integer representation of that key, and the key and value are inserted into the array at the hash index. To retrieve an item from the hashmap by its key, we can just re-hash the key and return the item from the array at the hashed index.

For example, assume we’re inserting an object into the array with the key “myobject”. We take the hash of “myobject” (assume that the hash function in this case gives us the integer value 34). We then insert the key and value into array element 34. To retrieve the data represented by the key “myobject”, we just take the hash of “myobject” again (gives us 34) and return the data at element 34.

It gets a little more complicated when you try to compensate for potential “hash collisions”, or cases where different keys produce the same hash value, and a lot more complicated when you try to judge the efficiency of your hashing function.

Anyhoo, I have a hashmap and a hashmap iterator class working in VC++. It’s not the most efficient example of a hashmap, but it follows the coding style laid down in the dynamic array and linked list classes and thus fits in with the rest of Woopsi’s code. As soon as I’ve converted it to use the PALib type names (u16 instead of “unsigned short”, for example) I’ll add it to the (proposed) supporting class folder.

Coding for Programmers, Coding in Public

Coding a toolkit for programmers is totally unlike any programming I’ve done before. Ordinarily I don’t mind too much if my object model isn’t completely logical, or if I take shortcuts in order to get things done - the nature of the industries I’ve worked in demand that I produce working code as quickly as possible. As long as the code is commented, structured, follows a consistent style and has descriptive names for everything (such that a junior programmer can maintain it or I can pick it up again in 12 months), I’m generally reasonably happy. Naturally I’d prefer it if I could revisit the code, tidy it up and optimise it, but that’s rarely an option.

Coding for other programmers is entirely different. Every piece of functionality needs to not only work, but it also needs to be designed in such a way that it will make sense to everyone else who wants to use the toolkit. The object model must make sense. As Woopsi is being developed in public (open-source is, I think, the closest programming gets to a performance art) all mistakes are easily scrutinised by anyone who cares to download the source, and all mistakes affect anyone using the toolkit. It could be very easy to get into the situation where I second-guess every decision and development stagnates.

Even little things that don’t seem like major design decisions can turn out to be. One example is the way that the Woopsi hierarchy is responsible for deleting gadgets. John - another Woopsi user - noted that this makes it impossible (or at the very least, inadvisable) to create gadgets on the stack. As soon as the gadget goes out of scope it gets deleted, which will leave the hierarchy with (potentially) a lot of trashed pointers. It seemed like a perfectly reasonable idea at the time, but John pointed out that FLTK leaves responsibility for deletion with the user. FLTK UI components can be created on the stack.

There’s a good quote over at Jeff Atwood’s blog about programming for programmers:

“Reuseable components are three times as hard to build”

No kidding.

Although it is arguably more difficult, coding in public gives huge benefits. This blog currently has 215 posts, but this number is vastly outweighed by the number of comments from users - 555 at the current count. The most surprising thing is the signal-to-noise ratio. There are virtually no useless comments. Okay, so most of the comments are by one guy, but when you’ve got one great programmer contributing to your blog you really don’t care.

If I’d decided to keep Woopsi closed-source until it was done, I’d still be coding for beginners and it wouldn’t be anything like it is today. The SimpleWindow class would still be in there.

There’s also the advantage of getting feedback when things go to plan. Here’s a quote from the DSTalk blog, regarding Woopsi 0.30:

Overall I am very happy with the update! Not only are whole host of bugs fixed and new features available to me but in the process of updating my code was vastly improved and tidied up.

Great stuff.

Returning to Academia

Whilst on the subject of university projects, I got an email from my old university at the weekend - I’ve got a place on the MSc. The course appears to be assessed mainly by exam, and I haven’t taken an exam in about 7 years.

Yikes.

Scrollbar Done?

March 9th, 2008 Ant 7 comments

The scrollbar now works. There were two hard-to-spot bugs conspiring to cause minor but annoying interface problems:

  • The Text::wrap() function was incorrectly increasing the line count by one in certain circumstances
  • The MultiLineTextBox was ignoring its padding value when telling the scrolling panel the size of its virtual canvas

These are both fixed.

I managed to come up with a solution to the rounding problem in the slider. It’s obvious, really - bitshift by different amounts to ensure that there’s always a fractional part in the equation, then round the fraction before removing the shift. To illustrate the solution:

// Calculate a ratio, ensuring that the result has a fractional byte
u32 ratio = (virtualHeight << 8) / gadgetHeight;
 
// Bitshift a value up by a byte and a half
u32 shiftedValue = someValue << 12;
 
// Dividing our values gives us a result with a half byte fraction
u32 result = shiftedValue / ratio;
 
// Round the value using a bitmask - if the fraction is equal to or
// greater than 8 (ie. a nibble halved, or 8/16, or 0.5 as a decimal)
// we increase the value by (1 << 4)
if (result & 0x8) {
    result += 0x10;
}
 
// Bitshift to remove the fraction before returning
return result >> 4;

In this code, we ensure that our ratio has a whole byte fraction (for accuracy). We then ensure that we shift any values working with that fraction up by a byte and a half, so when we perform the division we’ve still got a half byte fraction left to work with. I went with the 12-bit shift because the Woopsi code I was working with used signed values.

The bottom of the slider is still off by one in some situations, but it’s close enough.

I solved the infinite loop problem with a more generic, tidier solution. It’s now possible to tell a gadget to stop firing events by calling “Gadget::setRaisesEvents(false)”. If the textbox in the debug console fires an event, I disable event firing in the slider before calling any of its methods. Same applies the other way around.

Last updates today are to the SDL code. I’ve brought it up to date with the main codebase. I’ve also re-written the input routines in the VBL function so that they query the mouse/keyboard state directly without interfering with the event queue. The result of this is that events are now processed correctly. The mouse doesn’t get stuck down, and the program quits properly.

Whilst I’m on the topic of SDL, I got a version of Woopsi built for Windows the other day using the GP2X dev kit.

One last thing to do to the vertical slider. Should I integrate it into the MultiLineTextBox so that it’s always available (toggle on/off in the constructor)? Or should I integrate it into the ScrollingPanel? Or should I create a new MultiLineTextBoxWithScrollbars class? Or should I add a pointer to a new SliderBase class into the ScrollingPanel so that scrollbars can be enabled/disabled by setting the pointer? This would give me the option of having skinned sliders very easily. Or should I leave scrollbars out of the basic classes completely, and leave people to attach their own scrollbars? There’s a bit of work involved in getting it all working; not a great deal, but it’s the kind of thing I’d like the API to do for me.

Thoughts?

EDIT: Added a horizontal slider. There’s still a problem with the slider’s calculations somewhere. Back to the debugger…

Categories: Algorithms, Woopsi Tags:

More Vectors and Linked Lists

February 29th, 2008 Ant 20 comments

What’s better than having a vector or a linked list? Having a vector and a linked list, natch. I coded these up today and added them into the SVN repo.

The template classes have a virtually identical set of functions, so they should be entirely interchangeable. The functionality they present is more or less the same as the subset of the vector class that Woopsi uses (push_back(), clear(), insert(), etc) so it should be fairly easy to drop them in as vector replacements.

The only difference between the two, as far as the external API is concerned, is that the linked list has a “getIterator()’ factory method that will produce a class that can iterate through the linked list. It provides functions like “moveToFirst()” and “moveToNext()”, thus allowing the developer to take advantage of the sequential nature of the linked list without the problems I ran into in the first version (embedding the iterator into the list itself is a really bad idea).

Obviously, the classes don’t use the STL iterator functions. The whole point of writing these two classes is to remove the STL from Woopsi.

Internally, the two are very different. The linked list is great at working with sequential data (think loops) and adding/removing items quickly. The dynamic array is useful for data that needs to be accessed randomly and that has a fairly stable number of items. It’s not so great at performing inserts, but it’s much faster than the linked list at retrieving the nth item.

Neither container class is integrated into Woopsi’s code yet.

I came across a couple of oddities when trying to put this together. First of all, trying to split a template class into .h and .cpp files results in some very strange errors. “NULL is not defined” was one. The solution is to make sure that all of a template class’ code is contained within the .h file – the .cpp file is not required.

A second problem I ran into was trying to work with a struct within a template class from another class. I had this situation:

template <class T>
class LinkedList {
public:
    struct LinkedListItem {
        T data;
    }
};

There just doesn’t seem to be any way to declare an instance of this struct outside of the LinkedList class. The compiler goes bonkers when it encounters an attempt at doing so. Switching to this works:

template <class T>
struct LinkedListItem {
    T data;
}
 
template <class T>
class LinkedList {
};

It may be that I’m missing something obvious, but none of the usual syntax would allow me to instantiate a LinkedListItem struct anywhere but within the LinkedList class when I used the first version. Template structs let me get around the problem – a handy feature, that.

This is the first time that I’ve tried writing template classes and the first time I’ve made a private constructor/friend class/factory method pattern in C++. I was rather surprised to find that it was easy to set up and works well.

One more thing I should mention. The sc.nds and nds.gba versions of Woopsi no longer seem to work; they don’t work in No$GBA, anyway (although the .nds version does, despite No$GBA’s error message when it initially tries to boot the ROM). I think this must be related to the DMA changes made the other day. If the ROMs still work on slot-2 flash carts (I’ve got an M3 and Supercard I can test it on) I’ll just assume that it’s a strange emulator problem.

Categories: Algorithms, Development, Woopsi Tags:

Vectors and Linked Lists

February 28th, 2008 Ant 4 comments

I had a quick go at replacing the vector class this evening with a non-STL equivalent. The obvious route to go for is a linked list, so that’s what I created - a template class representing a doubly-linked list with both sequential access and simulated random access.

Random access is simulated by iterating along the list until the correct index is reached, which gives me bounds checking for free. Sequential access is wrapped up inside the class and hidden away, and is handled by an “iterator” member variable. It can be repositioned to the start or end of the list or moved to a random index, from where it can be moved up or down the list of items.

I got the class finished and plugged it into the code (after some tinkering). It didn’t work. One of the main problems with it is the internal iterator pointer - if one iterative function calls another, the iterator gets moved around and nothing works. I knew it wasn’t an ideal solution when I wrote it, and had the nagging feeling that it wasn’t going to work, but this obvious problem didn’t become apparent until I actually watched the code in a debugger.

I can see two ways out of this. The easiest way is to move the iterator code into a separate class, which will mean I can have as many iterators as I want (I’m thinking of something like .NET’s “GetEnumerator()” function). However, this doesn’t solve the basic problems of a linked list. Although I use a lot of sequential access in Woopsi, a significant amount of code uses random access of the gadget vectors. Linked lists aren’t designed for random access.

I’d love to have a C#/Javascript-style custom iterator…

What would be better is a standalone implementation of the vector that just supplies the features that Woopsi uses. First question, then - what is a vector? The Wikipedia article and the description on cplusplus.com suggest that the vector is really nothing more than a dynamic array, or an array that can resize itself. This is typically achieved by allocating a fixed size to start off with (16 elements, say), then allocating a larger array if the original array fills up (32 elements - the original array plus 16 more), copying from one to the other, and finally freeing the original memory. It offers less memory fragmentation than a linked list and true random access, but the resize operation is much more expensive. Insert and remove operations anywhere but at the end of the array are similarly more costly.

I think I’m going to go for the second option - a dynamic array class. It might even be possible to abuse the DMA hardware further and get it to help out with the resize.

Categories: Algorithms, Development, Woopsi Tags:

Emulation and Other Thoughts

January 7th, 2008 Ant 11 comments

Things are still going a little slowly here in Zombie Towers. The only thing worse than programming dreams for disturbing your sleep is having programming dreams whilst ill. They’re essentially the same, but instead of waking up with the solution to a particularly thorny problem you wake up exhausted with a head full of nonsensical junk and the slightly worrying feeling that you’ve lost the ability to add simple numbers together. Having had a couple of meaningless physics dreams after reading half of Stephen Hawking’s “Theory of Everything”, I’ve been avoiding any complex thought and have been absorbing anime instead.

I did sneak a small coding project in, though - a CHIP-8 emulator. I was introduced to emulation back in the days of Amiga Format when one of the issues came with a couple of Game Boy emulators on the coverdisk. Despite being an even more abysmal programmer then than I am now, there weren’t many programs around that were completely beyond my understanding. However, emulators were one of them and I was immediately fascinated. How on earth would someone go about writing an emulator? I hadn’t even the vaguest notion of where to start. Understanding and then writing an emulator has been on my “things to do” list ever since.

I got past the “understanding” part of the equation ages ago, but it’s taken until now for me to get around to writing an emulator. I’d intended to write a CHIP-8 emulator in Flash MX, but couldn’t be bothered in the end (I was switching to Java at the time) and someone else got there first.

The CHIP-8 is a curious machine, mainly because it’s not a machine at all. There never was a CHIP-8 CPU. It’s probably one of the earliest examples of a commercial virtual machine. With the proliferation of dozens of incompatible home consoles in the late 70s, console manufacturers decided that there should be a common hardware platform to write games for. The simplest way to achieve this was to specify a virtual hardware platform and have each physical platform implement an interpreter. The CHIP-8 was the result. It is an 8-bit CPU featuring 4KB of RAM (of which the bottom 512 bytes is reserved for the hardware platform’s BIOS), 16 multi-purpose registers (the last of which doubles as the carry flag), a program counter, stack pointer, index register (which I think is a 70s name for a dedicated address register), 35 opcodes, a 16-key keypad (hard-wired into the CPU via another set of boolean registers) and some weird design decisions that reflect the fact that it is a virtual machine, not a CPU. It has two opcodes to draw sprites, for example - I doubt that anything so high-level and specific exists in a physical general-purpose CPU (though the NES’ not-quite-6502 might have something similar).

My emulator has most of the opcodes implemented, but some of the features aren’t implemented yet - the sprites in particular are tricky, mainly because there’s a dearth of documentation describing how the sprites are supposed to work. There’s no documentation about how fonts are stored, either - I imagine that they’re supposed to be coded into the BIOS memory somewhere, but finding out what the fonts look like or what address they’re supposed to be stored at would seem to involve getting hold of an existing emulator’s sourcecode and digging around in that.

I might get around to finishing it eventually. It’s enough just to have got the bulk of the CPU emulated, and it’s a distinctly bizarre experience to watch code you’ve written execute a binary written by someone else.

Writing an emulator is actually very simple. A CPU can be modelled as a large switch statement. Memory can be emulated by allocating an array, and CPU registers can be simulated in the same way. The emulator program fetches each distinct unit of information (“opcode”, “operation code”, or CPU instruction) from the binary it is executing and feeds it into the CPU switch statement. The switch statement decides which instruction the opcode represents then runs that instruction. A simple CPU could look like this:

void emulate(unsigned short opcode) {
 
	// Extract instruction and data from opcode
	instruction = (opcode & 0xFF00) >> 8;
	x = (opcode & 0x00F0) >> 4;
	y = (opcode & 0x000F);
 
	// Parse opcode
	switch(instruction) {
		case 1:
			// Add register x to register y and store in register x
			add(x, y);
			break;
		case 2:
			// Subtract register x from register y and store in register x
			sub(x, y);
			break;
		case 3:
			// Multiply register x with register y and store in register x
			multiply(x, y);
			break;
	}
}

This simple CPU can perform three actions - addition, subtraction and multiplication. The opcode is a 16-bit value comprised of the instruction code to execute (highest byte) and two 4-bit data values (lowest two nibbles). We use bitmasks and bitshifts to extract the instruction and data from the opcode in order to parse it in the switch statement.

The CHIP-8 is more complex, naturally - it has 35 instructions, and there is no set format for the instruction and data portions of the opcode. The instruction can consist of just the highest nibble or it can use the whole highest byte. The data portion of the opcode can consist of a single 4-bit value, two 4-bit values, a byte and a nibble, or a single 12-bit value, depending on which instruction it is paired with.

The basic premise holds, though - there are other ways to achieve the same thing (a jump table instead of a switch statement, for example), but writing a CPU emulator is just a case of getting hold of a list of opcodes, working out how the opcodes are structured, what the opcodes do and how they affect the system’s memory and registers.

Now that I know how all of this works, it’s easy to see how assemblers and disassemblers work (I might get around to writing my own CHIP-8 assembler if I finish the emulator). It’s easy to see how a Z80 or 6502 emulator would work. The 6502 is particularly easy, assuming you’re only interested in the documented opcodes (of which there are 40-odd; there are around 200 undocumented opcodes) and aren’t worried about making it cycle-exact.

Other than learning all of this gubbins, I’ve been pondering some more Woopsi developments. First of all, I’ve got a plan for the scroll bars. Each scroll bar (horizontal and vertical) will be a gadget comprised of three sub-gadgets. They need up and down/left and right buttons and a slider gadget, which is itself comprised of a “gutter” and a “grip”.

I’ve also been pondering Jeff’s suggestion that gadgets pass requests around via events. At the moment, the only place this is particularly relevant is in the screen flip and depth sort buttons. Currently, the buttons call their parent’s “flip()” or “swapDepth()” functions. Jeff’s suggestion, to aid subclassers, is that the buttons just trigger flip or depth swap events in their event handlers (the event handler would be set to the parent for these decoration gadgets). It doesn’t seem like a particularly big change to swap from “_parent->flip()” to “_eventHandler->handleEvent(EVENT_FLIP)”, but it is actually a fundamental shift in how the system hangs together. At the moment, the system is very rigid. Each gadget orders the other gadgets to do something. The flip buttons order their screens to flip from one display to the other. Switching to event-driven interaction will mean that gadgets request operations instead. Gadgets will no longer say, “You must flip to the top display.” They will instead ask each other, “I say, old chap, do you mind awfully if you flip displays?” The whole system becomes much more fluid and, indeed, flexible.

Thinking about this has been like looking at a 2D representation of a cube - one moment it looks like the cube protrudes, the next it seems like the cube is inset. I’ve been swapping between thinking that it’s a good idea to rejecting it and back again, but I’ve finally decided that it’s a good idea. Why not make the system more flexible? Other than an extra function call, what are the downsides? There are several good reasons for the change, but no really good reasons not to implement it.

Categories: Algorithms, Assembly, Development, Woopsi Tags:

Scrolling Panels

December 19th, 2007 Ant 15 comments

The ScrollingPanel gadget now works. It can scroll its content in any direction using the DMA hardware.

Scrolling vertically was easy to implement, and it works in the same way as the TextViewer class:

  • Work out how far we want to scroll in pixels.
  • Copy all visible rows up or down that number of pixels.
  • Fill the region that has scrolled into view by calling the draw(Rect) routine.

The major difference between the ScrollingPanel and the TextViewer is that the ScrollingPanel scrolls individual visible regions. It’s got clipping built-in, whereas the TextViewer still doesn’t have clipping because the iteratively-developed code is a godawful mess.

Scrolling horizontally was more complicated. Back when I was playing with solid-window dragging, I decided that the only way to perform horizontal scrolling using the DMA hardware was to:

  • Work out how far we want to scroll in pixels.
  • Loop through all visible rows.
  • Copy chunks of data the width of the scroll distance left or right along the row.
  • Fill the new regions by calling draw(Rect).

You can’t just copy 10 pixels from (x = 1) to (x = 5) because they’re copied as individual u16s. By the time you’ve copied half of the pixels and reached (x = 10), you’ve overwritten the second-half of the data you wanted to copy.

I implemented the above routine and ran into a problem. If we try to scroll an area of 10x10 pixels by one pixel, we have to copy the whole area pixel by pixel. This results in 100 copy operations. I was aware of the issue before I started, but decided it was worth a go anyway. It wasn’t too bad - it was certainly fast enough under emulation.

However, the obvious solution finally presented itself. If the routine creates an off-screen array of u16s with a length equal to the width of a row, I can just copy the whole row to the array and then copy it back to its new horizontal position. One extra copy routine and a memory allocation is a lot faster than 100 copy operations and associated calculations. It would probably even be feasible to have solid window dragging using that technique.

I’m not certain at this stage how well the ScrollingPanel will function as a gadget container. It currently has a method to adjust the positions of any child gadgets when the panel is scrolled, but it does not redraw its children. The problem is that we only want a small portion of a gadget to redraw, but the existing Gadget::draw(rect) function assumes that the rectangle parameter represents a valid, visible region, and will draw it without any further checks. I’d need to either:

  • Implement a new draw routine that will accept a clipping rectangle, but clip that rectangle to the gadget’s internal region cache before drawing, or;
  • Just call draw() on all children and allow them to completely redraw themselves.

The first option is trickier than you might expect, to the point that it really isn’t justified. The second option is easy, but will result in flickering child gadgets as the panel scrolls.

On the positive side, I think I should be able to modify the MultiLineTextBox so that inherits from the ScrollingPanel and thus immediately equip it with omni-directional scrolling functionality.

In order for scrollbars to work, the ScrollingPanel needs to keep a note of the lowest pixel drawn within it (so that we know how tall the draw space is). Must remember to do that. Or possibly some other mechanism that achieves a similar result.

Categories: Algorithms, DS Coding, PALib, Woopsi Tags:

Clipping to Parents

December 6th, 2007 Ant 11 comments

Gadgets are now clipped to their parent’s dimensions.

(Cue round of applause and champagne corks.)

This was incredibly difficult to implement. First of all, I needed to write a routine that would return a clipped rect - if the x value of the gadget is less than the x value of the parent, use the parent x, and if the gadget’s (x + width) is greater than the parent’s (x + width), set the width of the rect to the difference between the two.

Once I’d got this, I could amend the visible rect caching function to use the clipped rect’s dimensions instead of the gadget’s. After that, I could amend the rectangle splitting function to use the same routine. The collision detection routines needed to be updated, too.

Doesn’t sound to difficult. In truth, it wasn’t. None of it worked, though, which was the problem. There seemed to be something very strange happening with the screen class - all of its children drew themselves OK, but the screen refused to co-operate. It just wouldn’t draw itself at all. Memory leak? Another deleted pointer? Maths problems? I went through everything over and over again, but couldn’t find anything wrong.

In the end, I modified the Debug class to output a non-transparent background so that I could see what it was writing out, and started printing values to the screen. Seems the Woopsi instance was reporting its width as 0 - aha! The clipping routine only goes up one level, so if the screen says it is 256 pixels wide, all gadgets will work on the assumption that they have 256 pixels to play with. When the screen tries to draw itself, though, it clips to Woopsi’s width of 0 pixels and nothing gets drawn.

It turns out that I’d got the parameters in the call to the base Gadget class the wrong way around in Woopsi’s constructor. Fignuts.

Other things I’ve fixed include marking the WindowDepthButton as a decoration, making a minor improvement to the Gadget::getClientRect function (missed out some pointless re-calculation of values), removing Woopsi::removeOverlappedRects (as it did exactly the same thing as the version in the base class), and re-writing the Gadget::eraseGadget function. This now handles gadget erasing using the gadget’s own visible rect cache, instead of bubbling the request up to the Woopsi instance and allowing that to send the redraw request back down through the hierarchy. As part of all of these changes, I’ve fiddled around with various functions that either invalidate the rect cache or call the cache function.

Hopefully I haven’t broken anything, but there’s a lot of fiddly changes here.

Pressing the “A” button in the latest SVN demo will cause the uppermost bitmap button tp move one pixel to the right. If everything has gone to plan, it gradually disappears into the edge of the containing window.

This should lay the groundwork for exciting new things such as scrolling panels (think SuperBitmap-style scrolling, but with scrollbars, that can have developer-defined “allowHorizontalScroll”, “allowVerticalScroll”, “maxHorizontalScroll” and “maxVerticalScroll” values, and that can contain child gadgets). If that works as I hope, I should be able to put together my AmigaGuide interface (scrolling text panel interspersed with buttons) without any real effort. (Aside from, y’know, making the gadgets and everything in the first place.)

Categories: Algorithms, Woopsi Tags: