ant.simianzombie.com
September 09, 2010, 04:05:17 PM *
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length
News:
 
   Home   Help Search Login Register  
Pages: [1]
  Print  
Author Topic: Code review  (Read 1253 times)
Lakedaemon
Newbie
*
Posts: 36


View Profile
« on: January 15, 2010, 07:27:25 PM »

Ok, when I think that I may slightly improve the code, I'll post it in this thread (not necessarilly in a specific order).
Beware, I may be wrong ^^


Edit : looking at the wrap algorithm...I hadn't realised that spaces got ignored at end and beginning of lines.
This may invalidate the following discussion


Let's begin with the Text Class

Line 320-330 : This condition looks like you always end up with index > _linePositions[top]) when the index isn't a start of line.
let's say that
0) bottom+1=mid=top-1
1) bottom=mid=top-1
2) bottom=mid=top
3) bottom=top+1=mid+1 index is in line top

You can't have combinations like bottom+1=mid=top

So the code paths are
1) =>  2) {index in top space}
0) => 2) {index in bottom space} or 2) {index in top space}
2) =>  3) {index in top space}
So, you allways end up with
index > _linePositions[top]
« Last Edit: January 15, 2010, 08:50:26 PM by Lakedaemon » Logged
Lakedaemon
Newbie
*
Posts: 36


View Profile
« Reply #1 on: January 15, 2010, 07:48:05 PM »

This method hasn't been converted to the new stringiterator class...
It is probably the source of the bugs in the multilinetextbox

// Calculate the length of an individual line sans right-hand spaces
const s16 Text::getLineTrimmedLength(const s32 lineNumber) const {
   s16 length = getLineLength(lineNumber);

   // Get char at the end of the line
   const char* currentChar = _text + _linePositions[lineNumber] + length - 1;
   u32 codePoint = 0;

   // Strip any trailing spaces, etc
   while (length > 0) {

      // Scan backwards to the next valid codepoint
      while (!(codePoint = getCodePoint(currentChar, NULL))) {
         currentChar--;
      }
      
      // Stop scanning if the current char is not blank
      if (!_font->isCharBlank(codePoint)) break;

      length--;
      currentChar--;
   }

   return length;
}

I'll try to convert it to the new iterator class and report if it fixed the bug

Edit : modifying it with the following code doesn't fix the bug (I would bet that the bug comes from a bad ascii/utf8 length though)

const s16 Text::getLineTrimmedLength(const s32 lineNumber) const {
   s16 length = getLineLength(lineNumber);

   // Loop through string until the end
   StringIterator* iterator = newStringIterator();
   
   // Get char at the end of the line
   iterator->moveTo(_linePositions[lineNumber] + length - 1);
   do{
      if (!_font->isCharBlank(iterator->getCodePoint())) break;
      length--;
   } while (iterator->moveToPrevious());
   return length;
}


B) This inline method smells fishy too as it still uses getCharArray() and obviously uses utf8 token numbers as an increment on a pointer
        inline const char* getLinePointer(const s32 lineNumber) const { return getCharArray() + _linePositions[lineNumber]; };

I replaced it by
 inline const char* getLinePointer(const s32 lineNumber) const {return getToken(_linePositions[lineNumber]);};

looks like it only ever gets used in
void MultiLineTextBox::drawCursor(Rect clipRect)
(this method uses char* too and looks fishy/not converted to-utf8 too)
So this isn't probably what is causing the weird bug.

C) I have done some testing with my little demo and  :
It looks like the length (in utf8 chars and width) of the lines are correct...
so that the wrapping algorithm looks ok

The bugs may happen because at some point while rendering the lines,
the whole string get translated backward (has a negativ offset applied)

-> instead of displaying

new font class with an ASCII
string first. Japanese test
ABCDEF
GH
some long text

It displays

new font class with an ASCII
with an ASCII string first.
Japane
se
test ABCDEFGH some long text

And this respect the line's widths...


The Bug happens when ABCDEF is a japanese string but doesn't when it is an ascii string -> it is utf-8 related

D) ok, I think I may have found the Bug in the woopsistring class


in the following method, getByteCount() should be used insted of getLength() as you are copying the chars...

void WoopsiString::setText(const WoopsiString& text) {

   // Ensure we've got enough memory available
   allocateMemory(text.getLength(), false);

   // Copy/filter the valid UTF-8 tokens into _text and cache the length
   u32 unicodeChars = 0;
   _dataLength = filterString(_text, text.getCharArray(), text.getLength(), &unicodeChars);
   _stringLength = unicodeChars;
}


The Good news : It fixes one bug ! (utf-8 text now displays ok if it is one line long)
The Bad news : The mysterious "text/multiline boxes and the strange offset" bug is still there

E) On a side note : the following method is wrong
void WoopsiString::setText(const u32 text)

as there is no way an u32 can represent a piece of an utf-8 string with a (char*) cast and computing a codepoint
of something that is already a codepoint is weird (and probably not intended ) through
   getCodePoint((char*)&text, &numBytes);

You might want to encode an u32 into 1 utf-8 token though (but that is not what is done in this method)
« Last Edit: January 16, 2010, 10:12:40 AM by Lakedaemon » Logged
Lakedaemon
Newbie
*
Posts: 36


View Profile
« Reply #2 on: January 16, 2010, 10:57:34 AM »

Another try at debugging...

1) in
void WoopsiString::append(const WoopsiString& text)
It should be
   _dataLength += filterString(_text + _dataLength, text.getCharArray(), text.getByteCount() , &unicodeChars);
instead of
   _dataLength += filterString(_text + _dataLength, text.getCharArray(), text.getLength(), &unicodeChars);

But if you want to append a (valid) woopsistring to another one, a memcopy operation should be enough, there is no need to filter the string again (it's slower).

If possible, The filter string method, should only be ever used once, in the setText method...
REgarding this, It appears that woopsistring are parsed multiple times when you set them with the multilinebox setText method
for example, the folowing method

WoopsiString::setText(const WoopsiString& text)

feels "wrong" as it reparses a valid woopsistring, the way it is implemented (shouldn't it just copy bytes if it is a copy method ?)

This behaviour appears in other woopsistring methods like
void WoopsiString::insert(const WoopsiString& text, u32 index)

I think that this is due to the fact that you changed the (const char *) inputs, that need to be filtered, in the methods 
by (const woopsistring&) that don't need to be filtered as they have already been at setText/initialisation time (if they aren't NULL ?).

The Override of the = operator (another very good idea, I didn't know this was possible), should be slightly altered too :

WoopsiString& WoopsiString::operator=(const WoopsiString& string) {
   if (&string != this) {
      setText(string); // don't setText there : use a memcopy
   }
   
   return *this;
}

WoopsiString& WoopsiString::operator=(const char* string) {
   setText(string); // this is right
   return *this;
}

WoopsiString& WoopsiString::operator=(const u32 letter) {
   setText(letter); // this is not what you really want to do. If you want to change 1 chor or 1 u32 to a valid utf8 token, you have to use another function. I''ll code it and post it here this afternoon
   return *this;
}
« Last Edit: January 16, 2010, 11:15:55 AM by Lakedaemon » Logged
Ant
Administrator
Full Member
*****
Posts: 138


View Profile WWW Email
« Reply #3 on: January 16, 2010, 11:22:15 AM »

This method hasn't been converted to the new stringiterator class...
It is probably the source of the bugs in the multilinetextbox

Implemented.  Well spotted, that was definitely a problem.

B) This inline method smells fishy too as it still uses getCharArray() and obviously uses utf8 token numbers as an increment on a pointer

Another good spot.  Have added the fix.

C) It displays

new font class with an ASCII
with an ASCII string first.
Japane
se
test ABCDEFGH some long text

And this respect the line's widths...

I've been trying to track this one down.  Weird thing is, I can't replicate it on my computer now.  It happened all the time then just stopped.

D) ok, I think I may have found the Bug in the woopsistring class

in the following method, getByteCount() should be used insted of getLength() as you are copying the chars...

I have actually fixed this one already.  Not sure when...

E) On a side note : the following method is wrong
void WoopsiString::setText(const u32 text)

as there is no way an u32 can represent a piece of an utf-8 string with a (char*) cast and computing a codepoint
of something that is already a codepoint is weird (and probably not intended ) through
   getCodePoint((char*)&text, &numBytes);

You might want to encode an u32 into 1 utf-8 token though (but that is not what is done in this method)

Hmm, yep.  You're right.  I might change that back to setText(char text) because I need a way of setting char values for the glyphs.
Logged
Ant
Administrator
Full Member
*****
Posts: 138


View Profile WWW Email
« Reply #4 on: January 16, 2010, 11:35:01 AM »

Another try at debugging...

1) in
void WoopsiString::append(const WoopsiString& text)
It should be
   _dataLength += filterString(_text + _dataLength, text.getCharArray(), text.getByteCount() , &unicodeChars);
instead of
   _dataLength += filterString(_text + _dataLength, text.getCharArray(), text.getLength(), &unicodeChars);

But if you want to append a (valid) woopsistring to another one, a memcopy operation should be enough, there is no need to filter the string again (it's slower).

Already seemed to have fixed the first but, but you're right - a memcpy() would be better.  Fixed!

This behaviour appears in other woopsistring methods like
void WoopsiString::insert(const WoopsiString& text, u32 index)

The Override of the = operator (another very good idea, I didn't know this was possible), should be slightly altered too :

WoopsiString& WoopsiString::operator=(const WoopsiString& string) {
   if (&string != this) {
      setText(string); // don't setText there : use a memcopy
   }
   
   return *this;
}
[/quote]

I've changed setText() so that is uses a memcpy(), so this function does not need to change.

WoopsiString& WoopsiString::operator=(const u32 letter) {
   setText(letter); // this is not what you really want to do. If you want to change 1 chor or 1 u32 to a valid utf8 token, you have to use another function. I''ll code it and post it here this afternoon
   return *this;
}

Yep, you're right.  All I need is to change its parameter to "const char letter" and it'll be fine.  I don't particularly need a method for appending a single utf8 token.

Thanks for all of the bugfixes; this is exactly the sort of help I wanted!   Grin
Logged
Lakedaemon
Newbie
*
Posts: 36


View Profile
« Reply #5 on: January 16, 2010, 01:41:18 PM »

C) It displays
new font class with an ASCII
with an ASCII string first.
Japane
se
test ABCDEFGH some long text

And this respect the line's widths...

I've been trying to track this one down.  Weird thing is, I can't replicate it on my computer now.  It happened all the time then just stopped.

My tests show that it (now?) happens only if you are using non-ascii chars in a woopsistring.
This makes my little demo the perfect tool to reproduce it.

The nasty little bug is still wrecking havock in r1237 (but we'll gut it in the end ^^).

Quote

Hmm, yep.  You're right.  I might change that back to setText(char text) because I need a way of setting char values for the glyphs.

You could still make it into a setText(u32 text) later so that it can eats either a char or an u32.
I'll post supporting code later.

Quote
Thanks for all of the bugfixes; this is exactly the sort of help I wanted!  

You are welcome.  I take the opportunity to send more of them nasty bug reports/and nice fixs ^^
Bug reports for 41237 :

in void StringIterator::moveToLast(),

_currentIndex = _string->getByteCount() - 1;
should be
_currentIndex = _string->getLength() - 1;


Edit : good news : it fixes the DAMN ANNOYING BUG
weeeeeeeeeeeee, heepeeeee, **does a little dance** 
(I got lucky !!!!!! ^^)

-> from now on, I'll focus a bit less on bug hunting/code review and a bit more on improving freetype/scrollingpane/richtextbox support
I'll still keep an eye open for bugs, as well as slight changes improving speed or simplicity and keep posting them in the forum.
« Last Edit: January 16, 2010, 01:51:07 PM by Lakedaemon » Logged
Lakedaemon
Newbie
*
Posts: 36


View Profile
« Reply #6 on: January 16, 2010, 02:57:33 PM »

Mmmhh.. got another bug when compiling my app...

This time, instead of displaying
ABC
def

it displays
ABC
ef?   where ? is a random char (maybee after f in memory).

So there is still a +1 offset ()

So, I'm still on the lookout.

in const s16 Text::getLineTrimmedLength(const s32 lineNumber) const

maybee
while (iterator->moveToPrevious());
should actually be
while (iterator->moveToPrevious() && (length>0));

in case you have a blank line full of space chars...maybee


By the way, why do you need to store/use longlines ?
What purpose do they serve ?
Logged
Lakedaemon
Newbie
*
Posts: 36


View Profile
« Reply #7 on: January 16, 2010, 07:16:54 PM »

If you want to encode 1 char or 1 u32 into a char * UTF-8 token, it's quite simple and you can do it this way :

const char* WoopsiString::encodeCodePoint(u32 codepoint, u8* numChars) const {
if (codepoint<0x80) {
    (*numChars)=1;
    buffer = new char[1];
    buffer[0]=codepoint;
    return buffer;
}

if (codepoint<0x0800) {
    (*numChars) = 2;
    buffer[0] = (codepoint>>6) + 0xC0;
    buffer[1] = (codepoint & 0x1F) + 0x80;
    return buffer;
}

if (codepoint<0x10000) {
    (*numChars) = 3;
    buffer[0] = (codepoint>>12) + 0xE0;
    buffer[1] = ((codepoint>>6) & 0x3F) + 0x80;
    buffer[2] = (codepoint & 0x3F) + 0x80;
    return buffer;
}
if (codepoint<0x10FFFF) {
    (*numChars) = 4;
    buffer[0] = (codepoint>>18) + 0xF0;
    buffer[1] = ((codepoint>>12) & 0x3F) + 0x80;
    buffer[2] = ((codepoint>>6) & 0x3F) + 0x80;
    buffer[3] = (codepoint & 0x3F) + 0x80;
    return buffer;
}
// after U+10FFFF, there aren't any "legal" unicode codepoints
// nothing prevents you from using the same algorithm to translate an u32 into a 6 bytes long string,
// but technically speaking these won't be Unicode
return NULL;
}
Logged
Ant
Administrator
Full Member
*****
Posts: 138


View Profile WWW Email
« Reply #8 on: January 18, 2010, 09:32:54 AM »

I have:

 - Added the WoopsiString::encodeCodePoint() method;
 - Altered WoopsiString::setText(char) so that it uses the encodeCodePoint() method;
 - Fixed StringIterator::moveToPrevious();
 - Added the fix to Text::getTrimmedLineLength().

I need to track the longest line in order to work out how wide the entire block of text is.  With that information, I know the preferred dimensions of the MultiLineTextBox.  It's useful if, for example, you create a textbox but want it to shrink so that it accommodates the wrapped text with no wasted space.

I'll check these changes in as soon as I can get it to compile.  I've switched to my hackintoshed netbook for development this week and it seems to have a broken installation of libfat.  Trying to upgrade to dkp r27 now, but downloading 44MB of devkit is taking a while on this net connection...
« Last Edit: January 18, 2010, 10:32:39 AM by Ant » Logged
Lakedaemon
Newbie
*
Posts: 36


View Profile
« Reply #9 on: January 18, 2010, 11:45:15 AM »

Thanks.

By the way, your fix for movetoLast() works fine AND fixed the nasty bug that my fix for MoveToLast (which was broken) introduced.

The MoveToPrevious() and MoveToNext() methods now work fine, but I'm still going to suggest speed improvements as these methods get called quite a lot.

on a completely different matter, I noticed that an undead monkey and a demon from the lake make quite an  outstanding and unheard of developing team. ^^

Edit : I thought a bit more about it and your fix may be broken too (as Woopsistring aren't 0 terminated so that _string->getCharArray() + _string->getByteCount(); points to a byte that isn't in the woopsistring, so you can't be sure moveto Previous will be used)

A fix to your fix of my fix of the evil bug might then be

void StringIterator::moveToLast() {

   // Position the iterator at the last byte in the string
   _currentChar = _string->getCharArray() + _string->getByteCount();
   _currentIndex = _string->getLength();

   // If the current byte does not represent a full character,
   // locate the start of the character
   moveToPrevious();
}



This raises (maybee) another concern : the iterator operations aren't atomic, that is currentchar and currentIndex might get corrupted and not point to a valid utf8 token, say if moveToPrevious exit earlier with a return false;

And last but not least, instead of
// Stop looking if we've reached the start of the string
      if (_currentIndex == 0) return false;
maybee it should be
// Stop looking if we've reached the start of the string
      if (_currentChar <= some value here) return false;


bool StringIterator::moveToPrevious() {
   
   // Abort if already at the start of the string
   if (_currentIndex == 0) return false;
   
   // Move back one char to ensure we're in the middle of a char sequence
   _currentChar--;
   
   while (!_string->getCodePoint(_currentChar, NULL)) {
      _currentChar--;
      
      // Stop looking if we've reached the start of the string
      if (_currentIndex == 0) return false;
   }
   
   // Loop has ended, so we must have found a valid codepoint; we know
   // that we're looking at the previous character index
   _currentIndex--;
   
   return true;
}
« Last Edit: January 18, 2010, 12:13:22 PM by Lakedaemon » Logged
Lakedaemon
Newbie
*
Posts: 36


View Profile
« Reply #10 on: January 18, 2010, 01:30:40 PM »

Ok, this speeds up things :

u8 StringIterator::getCodePointSize() {

   // Return 0 if string has no data
   if (_string->getLength() == 0) return 0;
   
  char value=*_currentChar;
  if (value<0x80) return 1; 
  if (value<0xC2) return 0; //can't be a leading char
  if (value<0xE0) return 2;
  if (value<0xF0) return 3;
  if (value<0xF4) return 4; // doesn't have legal unicode leading char after that
  return 0;

}

and fixes things !


void StringIterator::moveToLast() {

   if (_string->getLength()>0){  // don't we need to make sure the woopsistring isn't NULL  and return a bool ?
                _currentChar = _string->getCharArray() + _string->getByteCount()-1;
//woopsistring are not 0terminated strings
                while ((_currentChar*>=0x80) && (_currentChar*<0xC0)) _currentChar--;
// woopsistring have been filtered before : no need to check if value >=0xF4
      _currentIndex = _string->getLength()-1;
        }
}

bool StringIterator::moveToPrevious() {
   
   // Abort if already at the start of the string
   if (_currentIndex == 0) return false;
   
   // Move back one char to ensure we're in the middle of a char sequence
   do {
    _currentChar--;
  } while ((*_currentChar>=0x80) && (*_currentChar<0xC0));   
   
   // Loop has ended, so we must have found a valid codepoint; we know
   // that we're looking at the previous character index
   _currentIndex--;
   
   return true;
}
« Last Edit: January 18, 2010, 01:40:18 PM by Lakedaemon » Logged
Ant
Administrator
Full Member
*****
Posts: 138


View Profile WWW Email
« Reply #11 on: January 18, 2010, 04:23:19 PM »

Today I learned that a 10.1" Samsung netbook loaded with OSX makes a crap development machine.  Full source tree builds take minutes, the screen is too small to be able to comfortably navigate around the sourcecode or the file system, and I think I spent two hours updating a codebase that didn't reflect the latest changes.  Grrrrr...

However, here's some fixes:

 - The MultiLineTextBox::drawCursor() function should work properly.
 - StringIterator::moveToLast() should work properly (got that in before you posted the fix).
 - Text::wrap() should work properly.
 - Added in the encodeCodePoint() method to WoopsiString.

I'll take a look at your latest fixes later.  Thanks for posting them!
Logged
Ant
Administrator
Full Member
*****
Posts: 138


View Profile WWW Email
« Reply #12 on: January 18, 2010, 04:38:38 PM »

OK, those latest StringIterator fixes are in now.
Logged
Lakedaemon
Newbie
*
Posts: 36


View Profile
« Reply #13 on: January 18, 2010, 08:10:58 PM »

Thanks for the commits

Today I learned that a 10.1" Samsung netbook loaded with OSX makes a crap development machine.  Full source tree builds take minutes, the screen is too small to be able to comfortably navigate around the sourcecode or the file system, and I think I spent two hours updating a codebase that didn't reflect the latest changes.  Grrrrr...

Yayh ^^;
The cpu/gpu on these machines are badly underpowered.
Maybee it will be possible to developp on those machine once the new googgle language, designed for fast compile time, goes live.

I'm using a xeon quad core desktop (which should have been an 8-core beast if the damn "rock solid" asus main board had been working properly)
And a dual core notebook, while on the go.


Those new mini-pc look sexy but I guess doing real work on them (compiling with c++/TeX using Python) is near impossible.

(I'm Still impatiently waiting for tegra2 powered ebook reader/netbook...to play with ^^)

Now that the string/iterator classes are working ok, I'll pester you a bit less and at last focus on getting some kind of richbox/layout widget (I'll share the code, when it's operationnal) working.
Now that I have had a good look at the multilinetext box/text/graphicport/string/iterator classes I might have a chance.

By the way, have you considered implementing move to negative s32 indexes (to look from the end) ?
That's not really important but it may be a nice minor feature to have
Logged
Pages: [1]
  Print  
 
Jump to:  

Powered by MySQL Powered by PHP Powered by SMF 1.1.11 | SMF © 2006-2007, Simple Machines LLC Valid XHTML 1.0! Valid CSS!