RSS 2.0
Sign In
# Wednesday, August 25, 2010

Accidentally we have found that implementation of String and StringBuilder have been considerably revised, while public interface has remained the same.

public sealed class String
{
  private int m_arrayLength;
  private int m_stringLength;
  private char m_firstChar;
}

This layout is dated to .NET 1.0.

VM, in fact, allocates more memory than that defined in C# class, as &m_firstChar refers to an inline char buffer.

This way string's buffer length and string's length were two different values, thus StringBuilder used this fact and stored its content in a private string which it modified in place.

In .NET 4, string is different:

public sealed class String
{
  private int m_stringLength;
  private char m_firstChar;
}

Memory footprint of such structure is smaller, but string's length should always be the same as its buffer. In fact layout of string is now the same as layout of char[].

This modification leads to implementation redesign of the StringBuilder.

Earlier, StringBuilder looked like the following:

public sealed class StringBuilder
{
  internal IntPtr m_currentThread;
  internal int m_MaxCapacity;
  internal volatile string m_StringValue;
}

Notice that m_StringValue is used as a storage, and m_currentThread is used to preserve thread affinity of the internal string value.

Now, guys at Microsoft have decided to implement StringBuilder very differently:

public sealed class StringBuilder
{
  internal int m_MaxCapacity;
  internal int m_ChunkLength;
  internal int m_ChunkOffset;
  internal char[] m_ChunkChars;
  internal StringBuilder m_ChunkPrevious;
}

Inspection of this layout immediately reveals implementation technique. It's a list of chunks. Instance itself references the last chunk (most recently appended), and the previous chunks.

Characteristics of this design are:

  • while Length is small, performance almost the same as it was earlier;
  • there are no more thread affinity checks;
  • Append(), and ToString() works as fast a in the old version.
  • Insert() in the middle works faster, as only a chuck should be splitted and probably reallocated (copied), instead of the whole string;
  • Random access is fast at the end O(1) and slows when you approaching the start O(chunk-count).

Personally, we would select a slightly different design:

public sealed class StringBuilder
{
  private struct Chunk
  {
    public int length; // Chunk length.
    public int offset; // Chunk offset.
    public char[] buffer; 
  }

  private int m_MaxCapacity;

  // Alternatively, one can use
  // private List<Chunk> chunks;
  private int chunkCount; // Number of used chunks.
  private Chunk[] chunks; // Array of chunks except last.

  private Chunk last; // Last chunk.
  private bool nonHomogenous; // false if all chunks are of the same size.
}

This design has better memory footprint, and random access time is O(1) when there were no inserts in the middle (nonHomogenous=false), and O(log(chunkCount)) after such inserts. All other characteristics are the same.

Wednesday, August 25, 2010 9:36:55 AM UTC  #    Comments [0] -
Thinking aloud | Tips and tricks
All comments require the approval of the site owner before being displayed.
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

[Captcha]Enter the code shown (prevents robots):

Live Comment Preview
Archive
<August 2010>
SunMonTueWedThuFriSat
25262728293031
1234567
891011121314
15161718192021
22232425262728
2930311234
Statistics
Total Posts: 387
This Year: 3
This Month: 0
This Week: 0
Comments: 1552
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)