XDocument vs Plain Object Model

Sunday, 08 January 2023

While doing a migration of some big xslt 3 project into plain C# we run into a case that was not obvious to resolve.

Documents we process can be from a tiny to a moderate size. Being stored in xml they might take from virtually zero to, say, 10-20 MB.

In C# we may rewrite Xslt code virtually in one-to-one manner using standard features like XDocument, LINQ, regular classes, built-in collections, and so on. Clearly C# has a reacher repertoire, so task is easily solved unless you run into multiple opportunities to solve it.

The simplest solution is to use XDocument API to represent data at runtime, and use LINQ to query it. All features like xslt keys, templates, functions, xpath sequences, arrays and maps and primitive types are natuarally mapped into C# language and its APIs.

Taking several xslt transformations we could see that xslt to C# rewrite is rather straightforward and produces recognizable functional programs that have close C# source code size to their original Xslt. As a bonus C# lets you write code in asynchronous way, so C# wins in a runtime scalability, and in a design-time support.

But can you do it better in C#, especially when some data has well defined xml schemas?

The natural step, in our opinion, would be to produce C# plain object model from xml schema and use it for runtime processing. Fortunately .NET has xml serialization attributes and tools to produce classes from xml schemas. With small efforts we have created a relevant class hierarchy for a rather big xml schema. XmlSerializer is used to convert object model to and from xml through XmlReader and XmlWriter. So, we get typed replacement of generic XDocument that still supports the same LINQ API over collections of objects, and takes less memory at runtime.

The next step would be to commit a simple test like:

read object model;
transform it;
write it back.

We have created such tests both for XDocument and for object model cases, and compared results from different perspectives.

Both solution produce very similar code, which is also similar to original xslt both in style and size.

Object model has static typing, which is much better to support.

But the most unexpected outcome is that object model was up to 20% slower due to serialization and deserialization even with pregenerated xmlserializer assemblies. Difference of transformation performance and memory consumption was so unnoticable that it can be neglected. These results were confirmed with multiple tests, with multiple cycles including heating up cycles.

Here we run into a case where static typing harms more than helps. Because of the nature of our processing pipeline, which is offline batch, this difference can be mapped into 10th of minutes or even more.

Thus in this particular case we decided to stay with runtime typing as a more performant way of processing in C#.

Sunday, 08 January 2023 13:28:14 UTC

Comments [0] -
.NET | xslt

Comments are closed.