RSS 2.0
Sign In
# Friday, 22 July 2011

We needed to track a stream position during creation of xml file. This is to allow random access to a huge xml file (the task is related to WindowsSearch).

This is a simplified form of the xml:

<data>
  <item>...</item>
   ...
  <item>...</item> 
</data>

The goal was to have stream position of each item element. With this in mind, we've decided to:

  • open a stream, and then xml writer over it;
  • write data into xml writer;
  • call Flush() method of the xml writer before measuring stream offset;

That's a code sample:

var stream = new MemoryStream();
var writer = XmlWriter.Create(stream);

writer.WriteStartDocument();
writer.WriteStartElement("data");

for(var i = 0; i < 10; ++i)
{
  writer.Flush();

  Console.WriteLine("Flush offset: {0}, char: {1}",
    stream.Position,
    (char)stream.GetBuffer()[stream.Position - 1]);
 
  writer.WriteStartElement("item");
  writer.WriteValue("item " + i);
  writer.WriteEndElement();
}

writer.WriteEndElement();
writer.WriteEndDocument();

That's the output:

Flush offset: 46, char: a
Flush offset: 66, char: >
Flush offset: 85, char: >
Flush offset: 104, char: >
Flush offset: 123, char: >
Flush offset: 142, char: >
Flush offset: 161, char: >
Flush offset: 180, char: >
Flush offset: 199, char: >
Flush offset: 218, char: >

Funny, isn't it?

After feeding the start tag <data>, and flushing xml writer we observe that only "<data" has been written down to the stream. Well, Flush() have never promissed anything particular about the content of the stream, so we cannot claim any violation, however we expected to see whole start tag.

Inspection of the implementation of xml writer reveals laziness during writting data down the stream. In particular start tag is closed when one starts the content. This is probably to implement empty tags: <data/>.

To do the trick we had to issue empty content, moreover, to call a particular method with particular parameters of the xml writer. So the code after the fix looks like this:

var stream = new MemoryStream();
var writer = XmlWriter.Create(stream);

writer.WriteStartDocument();
writer.WriteStartElement("data");

char[] empty = { ' ' };

for(var i = 0; i < 10; ++i)
{
  writer.WriteChars(empty, 0, 0);
  writer.Flush();

  Console.WriteLine("Flush offset: {0}, char: {1}",
    stream.Position,
    (char)stream.GetBuffer()[stream.Position - 1]);

  writer.WriteStartElement("item");
  writer.WriteValue("item " + i);
  writer.WriteEndElement();
}

writer.WriteEndElement();
writer.WriteEndDocument();

And output is:

Flush offset: 47, char: >
Flush offset: 66, char: >
Flush offset: 85, char: >
Flush offset: 104, char: >
Flush offset: 123, char: >
Flush offset: 142, char: >
Flush offset: 161, char: >
Flush offset: 180, char: >
Flush offset: 199, char: >
Flush offset: 218, char: >

While this code works, we feel uneasy with it.

What's the better way to solve the task?

Update: further analysis shows that it's only possible behaviour, as after the call to write srart element, you either can write attributes, content or end of element, so writer may write either space, '>' or '/>'. The only question is why it takes WriteChars(empty, 0, 0) into account and WriteValue("") it doesn't.

Friday, 22 July 2011 21:08:36 UTC  #    Comments [0] -
Thinking aloud | Tips and tricks | Window Search
All comments require the approval of the site owner before being displayed.
Name
E-mail
Home page

Comment (Some html is allowed: a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u) where the @ means "attribute." For example, you can use <a href="" title=""> or <blockquote cite="Scott">.  

[Captcha]Enter the code shown (prevents robots):

Live Comment Preview
Archive
<2024 March>
SunMonTueWedThuFriSat
252627282912
3456789
10111213141516
17181920212223
24252627282930
31123456
Statistics
Total Posts: 386
This Year: 2
This Month: 0
This Week: 0
Comments: 938
Locations of visitors to this page
Disclaimer
The opinions expressed herein are our own personal opinions and do not represent our employer's view in anyway.

© 2024, Nesterovsky bros
All Content © 2024, Nesterovsky bros
DasBlog theme 'Business' created by Christoph De Baene (delarou)