In our project we're generating a lot of xml files, which are subjects of manual
changes, and repeated generations (often with slightly different generation
options). This way a life flow of such an xml can be described as following:
If it were a regular text files we could use diff utility to prepare
patch between versions 1 and 2, and apply it with patch utility to
a version 3. Unfortunately xml has additional semantics compared to a plain text. What's an
invariant or a simple modification in xml is often a drastic change in text.
diff/patch does not work well for us. We need xml diff
The first guess is to google it! Not so simple.
We have failed to find a tool or an API that can be used from ant. There are a
lot of GUIs to show xml differences and to perform manual merge, or doing
similar but different things to what we need
(like MS's xmldiffpatch).
Please point us to such a program!
Meantime, we need to proceed. We don't believe that such a tool can be
done on the knees, as it's a heuristical and mathematical at the same time
task requiring a careful design and good statistics for the use cases. Our idea
is to exploit
diff/patch. To achieve the goals we're going to
perform some normalization of xmls before diff to remove redundant
invariants, and normalization after the patch to return it into a readable form.
This way we expect to recieve files reacting to modifications similarly to text
a@href@title, b, blockquote@cite, em, i, strike, strong, sub, super, u