TL;DR: XmlDocument.Validate
is not a read-only operation. but changes the document tree. XDocument.Load(XmlReader)
adds attributes, even if they are marked as unspecified.
Two interesting quirks of XmlDocument
and XDocument
worked together to take me by surprise last week.
I was refactoring a method using XmlDocument
to build … well … an XML document. To make sure I did not break everything I had established a regression test comparing the resulting XML document to an expected XML document. For lack of Equals
support on XmlDocument
I used XDocument
‘s deep-equals capabilities (through FluentAssertions) by loading the expected XML document into an XDocument
and — and this is the important part — converting the XmlDocument
returned by the method into an XDocument
using an XmlNodeReader
:
XmlDocument returnedXmlDoc = MyMethod(); XDocument expectedDoc = XDocument.Load(fileStream); XDocument actualDoc = XDocument.Load(new XmlNodeReader(returnedXmlDoc)); actualDoc.Should().BeEquivalentTo(expectedDoc());
This worked fine during the refactoring.
Then I made one more change to MyMethod
: internally, MyMethod
was validating the built XmlDocument
against a schema. Before the change it had the XML content in a MemoryStream
. It built an XmlDocument
just for validation from that stream. After successful validation it would return a fresh XmlDocument
built from the same stream. After the change, it returned the successfully validated XmlDocument
without building a fresh one. Why throw away a perfectly good XmlDocument
after all just to build it again?
And suddenly, the regression test failed. It reported an unexpected attribute on actualDoc
. What happened?
Comparing the XML content of the XDocument
s actualDoc
and expectedDoc
revealed that some XML elements in actualDoc
now contained additional attributes that were optional in the schema with their values set to their defaults.
It turns out that XmlDocument.Validate
is not — as I intuitively thought — a read-only operation. This seems non-obvious and is not mentioned in the method documentation itself. Had I read the online documentation, it would have told me, among other things, that
after successful validation, schema defaults are applied
https://docs.microsoft.com/en-us/dotnet/api/system.xml.xmldocument.validate?view=net-5.0#System_Xml_XmlDocument_Validate_System_Xml_Schema_ValidationEventHandler_ (see “Remarks”)
In this case, Validate
added XML nodes for attributes declared in the schema as optional. It did this in a semantically correct way by setting their XmlAttribute.Specified
property to false
. The string rendered from the XmlDocument
would not show those attributes, same as before.
This meant, the failing test was a false negative. The method was still working fine, but the test wrongly reported a deviation.
How then did those attributes find their way into the XDocument
and make the regression test fail?
The moving parts are XmlNodeReader
and XDocument.Load
.
The reference source of XmlNodeReader
(https://referencesource.microsoft.com/#System.Xml/System/Xml/Dom/XmlNodeReader.cs) tells us that the XmlNodeReaderNavigator
‘s IsDefault
property respects the Specified
property of attributes and will return false for those newly added default attributes. XmlNodeReader
delegates its IsDefault
property to its internal XmlNodeReaderNavigator
and thus also respects the Specified
property.
The hunt continues by looking at XDocument
‘s Load
method (see https://referencesource.microsoft.com/System.Xml.Linq/System/Xml/Linq/XLinq.cs.html). It turns out that the Load
method completely ignores its XmlReader
‘s IsDefault
. This could be by design or because the default XmlReader implementation always returns false (see https://referencesource.microsoft.com/System.Xml/System/Xml/Core/XmlReader.cs.html).
The XDocument
will happily load the unspecified attribute and add it to itself, just as if it were specified. Since XElement
and XAttribute
themselves seem to have no way of marking an attribute unspecified, the attribute is now there to stay and will find its way into the rendered XML string as well.
And all this because we validated an XmlDocument
.
The judgement call now is to decide what the goal of your test is and thus whether the test or the implementation is wrong.
I was able to take the easy way out: I let MyMethod
validate a clone of the XmlDocument
and throw that away after successful validation. MyMethod
will most likely build the XML document with XDocument
or with generated POCOs eventually. The problem described here will then simply cease to exist.