Gotcha with XmlDocument.Validate and XDocument.Load

TL;DR: XmlDocument.Validate is not a read-only operation. but changes the document tree. XDocument.Load(XmlReader) adds attributes, even if they are marked as unspecified.

Two interesting quirks of XmlDocument and XDocument worked together to take me by surprise last week.

I was refactoring a method using XmlDocument to build … well … an XML document. To make sure I did not break everything I had established a regression test comparing the resulting XML document to an expected XML document. For lack of Equals support on XmlDocument I used XDocument‘s deep-equals capabilities (through FluentAssertions) by loading the expected XML document into an XDocument and — and this is the important part — converting the XmlDocument returned by the method into an XDocument using an XmlNodeReader:

XmlDocument returnedXmlDoc = MyMethod();
XDocument expectedDoc = XDocument.Load(fileStream);
XDocument actualDoc = XDocument.Load(new XmlNodeReader(returnedXmlDoc));

actualDoc.Should().BeEquivalentTo(expectedDoc());

This worked fine during the refactoring.

Then I made one more change to MyMethod: internally, MyMethod was validating the built XmlDocument against a schema. Before the change it had the XML content in a MemoryStream. It built an XmlDocument just for validation from that stream. After successful validation it would return a fresh XmlDocument built from the same stream. After the change, it returned the successfully validated XmlDocument without building a fresh one. Why throw away a perfectly good XmlDocument after all just to build it again?

And suddenly, the regression test failed. It reported an unexpected attribute on actualDoc. What happened?

Comparing the XML content of the XDocuments actualDoc and expectedDoc revealed that some XML elements in actualDoc now contained additional attributes that were optional in the schema with their values set to their defaults.

It turns out that XmlDocument.Validate is not — as I intuitively thought — a read-only operation. This seems non-obvious and is not mentioned in the method documentation itself. Had I read the online documentation, it would have told me, among other things, that

after successful validation, schema defaults are applied

https://docs.microsoft.com/en-us/dotnet/api/system.xml.xmldocument.validate?view=net-5.0#System_Xml_XmlDocument_Validate_System_Xml_Schema_ValidationEventHandler_ (see “Remarks”)

In this case, Validate added XML nodes for attributes declared in the schema as optional. It did this in a semantically correct way by setting their XmlAttribute.Specified property to false. The string rendered from the XmlDocument would not show those attributes, same as before.

This meant, the failing test was a false negative. The method was still working fine, but the test wrongly reported a deviation.

How then did those attributes find their way into the XDocument and make the regression test fail?
The moving parts are XmlNodeReader and XDocument.Load.

The reference source of XmlNodeReader (https://referencesource.microsoft.com/#System.Xml/System/Xml/Dom/XmlNodeReader.cs) tells us that the XmlNodeReaderNavigator‘s IsDefault property respects the Specified property of attributes and will return false for those newly added default attributes. XmlNodeReader delegates its IsDefault property to its internal XmlNodeReaderNavigator and thus also respects the Specified property.

The hunt continues by looking at XDocument‘s Load method (see https://referencesource.microsoft.com/System.Xml.Linq/System/Xml/Linq/XLinq.cs.html). It turns out that the Load method completely ignores its XmlReader‘s IsDefault. This could be by design or because the default XmlReader implementation always returns false (see https://referencesource.microsoft.com/System.Xml/System/Xml/Core/XmlReader.cs.html).

The XDocument will happily load the unspecified attribute and add it to itself, just as if it were specified. Since XElement and XAttribute themselves seem to have no way of marking an attribute unspecified, the attribute is now there to stay and will find its way into the rendered XML string as well.

And all this because we validated an XmlDocument.

The judgement call now is to decide what the goal of your test is and thus whether the test or the implementation is wrong.

I was able to take the easy way out: I let MyMethod validate a clone of the XmlDocument and throw that away after successful validation. MyMethod will most likely build the XML document with XDocument or with generated POCOs eventually. The problem described here will then simply cease to exist.