C# Source Generators have been available for a while now. I thought people would immediately start using them like type providers in F#, in particular for generating POCOs from JSON or XML schemas. But, surprisingly, I could not find a source generator for XML Schema yet. So, let me be the first to try.
Why bother?
There are a couple of advantages of compile-time code generation compared to static code generation for this use-case. A common workflow is validating some XML against an XSD and then deserializing the XML into POCOs for easier handling in C#. Statically generated code can deviate from the XSD used for validation. Generating those POCOs from the XSD on-the-fly via a Source Generator means that the XSD is the single source of truth for the entire workflow.
The code generation will run automatically and save you from having to remember and then actually do rerun the generation on schema updates. You cannot accidentally change generated code and then lose the changes when regenerating, because it is only available during compile-time (VS will show it to you as read-only)
Additionally, you don’t have to check thousands of lines of generated code into your version control system.
Easy coding, difficult tooling
It turned out easier that I thought and harder than I thought at the same time. It was very easy, trivial actually, to implement the actual source generator. It does not compare to writing a code analyzer or diagnostic suppressor at all. What was hard was working through the suboptimal tooling, outdated/non-existent documentation, weird VS behavior and some issues with the class generation library I used.
So here we go.
Use the latest Tools
There are only two XSD to POCO generators I know. One of them is the good-ol’ xsd.exe which is immediately disqualified on accounts of it being its own process. The second one is XmlSchemaClassGenerator, which can just be used as a library. This is the one I used.
Make sure you use the latest .NET SDK (for me it was 5.0.302) and the latest Visual Studio (for me it was 16.10.4), because the Source Generator support for Intellisense and even its API is still a moving target. An early scaffold of doing this for CSV was my go-to guide for starting with Source Generators, but is already outdated as class names have changed.
Implementing the Generator is Easy
Create a generator project with a generator class just like the documentation tells you.
Now, since we are going to create completely new code for new classes, we do not have to bother with any kind of syntax or semantic tree. All we need is a string containing our code and calling context.AddSource(generatedPocoCodeStr)
in the generator’s Execute
method.
It is that simple.
XmlSchemaClassGenerator will give the string to us if we massage it a little by implementing its abstract OutputWriter
. To do this, we can copy-paste its test-internal MemoryOutputWriter
into our generator project and use it to write the generated code to a string. For simplicity and seeing this work at all first, we add some XSD as embedded resource to the generator project and hard-code the resource path. Later, we will make this configurable from the consuming project.
[Generator] public class XsdSourceGenerator : ISourceGenerator { internal class MemoryOutputWriter : OutputWriter { public string Content { get; set; } public override void Write(CodeNamespace cn) { var cu = new CodeCompileUnit(); cu.Namespaces.Add(cn); using (var writer = new StringWriter()) { Write(writer, cu); Content = writer.ToString(); } } } public void Execute(GeneratorExecutionContext context) { var schemaSet = new XmlSchemaSet(); schemaSet.Add( "mysamplenamespace", XmlReader.Create(typeof(XsdSourceGenerator).Assembly .GetManifestResourceStream("XsdToSource.sample_schema.xsd"))); var generator = new Generator(); MemoryOutputWriter memoryOutputWriter = new MemoryOutputWriter(); generator.OutputWriter = memoryOutputWriter; generator.Generate(schemaSet); context.AddSource("pocos", memoryOutputWriter.Content); } public void Initialize(GeneratorInitializationContext context) { // do nothing } }
So far, so good.
The simple sample XSD looks like this:
<?xml version="1.0" encoding="utf-8"?> <xs:schema id="XMLSchema1" targetNamespace="mysamplenamespace" elementFormDefault="qualified" xmlns="http://tempuri.org/XMLSchema1.xsd" xmlns:mstns="http://tempuri.org/XMLSchema1.xsd" xmlns:xs="http://www.w3.org/2001/XMLSchema" > <xs:element name="MyRootElement"> <xs:complexType id="RootType"> <xs:all> <xs:element name="Child1"> <xs:simpleType id="Child1Type"> <xs:restriction base="xs:boolean"/> </xs:simpleType> </xs:element> <xs:element name="Child2"> <xs:simpleType id="Child2Type"> <xs:restriction base="xs:string"/> </xs:simpleType> </xs:element> </xs:all> </xs:complexType> </xs:element> </xs:schema>
We now create a consuming project and reference the generator project like this:
<ProjectReference Include="..\XsdToSource\XsdToSource.csproj" OutputItemType="Analyzer" ReferenceOutputAssembly="false" />
And try using the generated class:
var pocoInstance = new Mysamplenamespace.MyRootElement();
Getting it to work is hard
We now get a build error:
2>CSC : warning CS8785: Generator ‘XsdSourceGenerator’ failed to generate source. It will not contribute to the output and compilation errors may occur as a result. Exception was of type ‘FileNotFoundException’ with message ‘Could not load file or assembly ‘XmlSchemaClassGenerator, Version=2.0.0.0, Culture=neutral, PublicKeyToken=4ac343b5e343bf8c’ or one of its dependencies. The system cannot find the file specified.’
2>C:\Users\justme\code\XsdToSource\Sample\PocoUsage.cs(9,36,9,53): error CS0246: The type or namespace name ‘Mysamplenamespace’ could not be found (are you missing a using directive or an assembly reference?)
This is one of the better error messages during generator development. In general, the errors for a non-functional generator are not very helpful and often require guesswork and internet research. Sometimes we only get an error from the consuming code that the generated classes do not exist (like the second error message above) without any hint as to why the generator does not do its job. To try out the generator, I found it most useful to not only use Visual Studio, but also the dotnet CLI build command. Both tend to show different error messages at different times and will together give you more information about what’s wrong.
The Source Generators Cookbook will tell you to package all dependencies (the entire hierarchy!) with the actual generator together in a single nuget package. It does unfortunately not tell you how to use such a generator within the same solution. Luckily, this comment and this discussion will. For using XmlSchemaClassGenerator nuget package in your Source Generator, this means:
<ItemGroup> <PackageReference Include="XmlSchemaClassGenerator-beta" Version="2.0.560" PrivateAssets="all" GeneratePathProperty="true" /> <PackageReference Include="System.CodeDom" Version="5.0.0" PrivateAssets="all" GeneratePathProperty="true" /> <PackageReference Include="System.ComponentModel.Annotations" Version="5.0.0" PrivateAssets="all" GeneratePathProperty="true" /> <PackageReference Include="System.Text.Encoding.CodePages" Version="5.0.0" PrivateAssets="all" GeneratePathProperty="true" /> <PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="5.0.0" PrivateAssets="all" GeneratePathProperty="true" /> <PackageReference Include="System.ValueTuple" Version="4.5.0" PrivateAssets="all" GeneratePathProperty="true" /> </ItemGroup> <PropertyGroup> <GetTargetPathDependsOn>$(GetTargetPathDependsOn);GetDependencyTargetPaths</GetTargetPathDependsOn> </PropertyGroup> <Target Name="GetDependencyTargetPaths"> <ItemGroup> <TargetPathWithTargetPlatformMoniker Include="$(PKGXmlSchemaClassGenerator-beta)\lib\netstandard2.0\XmlSchemaClassGenerator.dll" IncludeRuntimeDependency="false" /> <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_CodeDom)\lib\netstandard2.0\System.CodeDom.dll" IncludeRuntimeDependency="false" /> <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_ComponentModel_Annotations)\lib\netstandard2.0\System.ComponentModel.Annotations.dll" IncludeRuntimeDependency="false" /> <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Text_Encoding_CodePages)\lib\netstandard2.0\System.Text.Encoding.CodePages.dll" IncludeRuntimeDependency="false" /> <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_ValueTuple)\lib\netstandard1.0\System.ValueTuple.dll" IncludeRuntimeDependency="false" /> <TargetPathWithTargetPlatformMoniker Include="$(PKGSystem_Runtime_CompilerServices_Unsafe)\lib\netstandard2.0\System.Runtime.CompilerServices.Unsafe.dll" IncludeRuntimeDependency="false" /> </ItemGroup> </Target>
Note the “netstandard1.0” for value tuples.
When we use a generator project from within the same solution, we have to restart Visual Studio every time we change the generator for it to pick the latest changes and regenerate the code. One more annoyance with using analyzers from within the same solution: if the generator project references nuget packages and includes the dlls via the mechanism above, then, on a clean working directory, we need to start VS, build the generator project, close VS again, start VS, then build the projects referencing the generator project. If we do not restart VS after building the generator, we get errors like these.
Only by having the generator project already built by the time we start VS, the external dlls will be correctly picked up.
Working around System.ComponentModel.Annotations
Now we can successfully build the consuming project with the dotnet CLI. In Visual Studio however, we get this build error:
2>CSC : warning CS8785: Generator ‘XsdSourceGenerator’ failed to generate source. It will not contribute to the output and compilation errors may occur as a result. Exception was of type ‘FileNotFoundException’ with message ‘Could not load file or assembly ‘System.ComponentModel.Annotations, Version=4.2.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’ or one of its dependencies.
Dealing with this problem is the topic of a different post. The workaround we are going to use here is including the right dll in our generator project.
We manually add the actual binary of the 4.2.0.0 assembly version of System.ComponentModel.Annotations to the “GetTargetPathDependsOn” or the nuget package, respectively. We can find it in the nuget package version 4.4.0. This means adding the dll to the source generator repo, because we cannot have nuget resolve the 5.0.0 package and the 4.4.0 package at the same time to reference the dll through $(PKG…):
<!-- Keep the public PackageReference so that consuming projects can use the generated code dependent on System.ComponentModel.Annotations. We don't need GeneratePathProperty because the generator does not use it itself, but uses the manually included binary from above --> <PackageReference Include="System.ComponentModel.Annotations" Version="5.0.0" /> <!-- For referencing within the same solution, point to the dll within the source generator's project folder --> <TargetPathWithTargetPlatformMoniker Include="$(MSBuildProjectDirectory)/System.ComponentModel.Annotations.dll" IncludeRuntimeDependency="false" />
Building a Generator nuget package
Let’s build a generator nuget package, because this is the usual way for a generator to be consumed. For building the nuget package, we again have to add all the transitive dependencies manually, this time to the package’s analyzer folder:
<ItemGroup> <None Include="$(OutputPath)\$(AssemblyName).dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" /> <None Include="$(PKGXmlSchemaClassGenerator-beta)\lib\netstandard2.0\*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" /> <None Include="$(PKGSystem_CodeDom)\lib\netstandard2.0\*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" /> <!-- put the manually included binary in the generator package instead of referencing from within the package through $(PKG --> <None Include="System.ComponentModel.Annotations.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" /> <None Include="$(PKGSystem_Text_Encoding_CodePages)\lib\netstandard2.0\*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" /> <None Include="$(PKGSystem_ValueTuple)\lib\netstandard1.0\*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" /> <None Include="$(PKGSystem_Runtime_CompilerServices_Unsafe)\lib\netstandard2.0\*.dll" Pack="true" PackagePath="analyzers/dotnet/cs" Visible="false" /> </ItemGroup>
Now, Visual Studio is able to build the project consuming the nuget package:
Making the Source Generator configurable
We still have our hard-coded xsd file in the generator project as an embedded resource. What we really want is the consuming project configuring the input file that is used for code generation. Also, we would like to configure some aspects of the generation. Since this can become a topic in itself, we will just configure what namespace the generator should use as a proof-of-concept.
The mechanism introduced to give analyzers non-source-code-related data is to declare it as AdditionalFile
in the csproj of the consuming project, like this:
<AdditionalFiles Include="sample_schema.xsd" />
And then pick it up in the generator like this:
AdditionalText schema = context.AdditionalFiles .First(additionalText => additionalText.Path.EndsWith(".xsd"));
Obviously, this is just a simplified example. You would want the generator to be able to deal with multiple xsd files.
At this point we do not have any source code reading or tree analysis in our generator and would like it to stay that way for simplicity’s sake. That is why, for configuring the generated POCOs’ namespace, we shamelessly plug from the above mentioned example source generator for CSV their way of configuring the generator through the csproj of the consuming project.
We make up some property name and then declare it in our AdditionalFile
element. We probably want to have some kind of namespacing for the made-up property name. This way our generator can be sure to only use additional files destined for itself and other generators or analyzers will not accidentally pick up a file not meant for them. What the CSV post won’t tell you, but is actually included in their sample project, is that you need to tell MSBuild with a CompilerVisibleItemMetadata
element to include your made-up property in the compiler information to which your generator has access. This is also described in the cookbook under “Consume MSBuild properties and metadata”. The result could look like this in the consuming project.
<!-- in the consuming project --> <AdditionalFiles Include="sample_schema.xsd" XsdToSource_RootNamespace="Sample.Generated"/> <!-- in a props file packaged with your generator (can be declared in the consuming project as well for debugging purposes) --> <CompilerVisibleItemMetadata Include="AdditionalFiles" MetadataName="XsdToSource_RootNamespace" />
Then, we extract the value of the XsdToSource_RootNamespace
metadata item by inspecting the AnalyzerConfigOptions
:
static IEnumerable<(AdditionalText SchemaFile, string Namespace)> GetConfigurations(GeneratorExecutionContext context) { foreach (AdditionalText file in context.AdditionalFiles) { if (Path.GetExtension(file.Path).Equals(".xsd", StringComparison.OrdinalIgnoreCase)) { AnalyzerConfigOptions analyzerConfigOptions = context.AnalyzerConfigOptions.GetOptions(file); if (analyzerConfigOptions.TryGetValue( "build_metadata.additionalfiles.XsdToSource_RootNamespace", out string @namespace)) { yield return (file, @namespace); } } } }
Not relevant for the simple proof-of-concept, but maybe for our viewers at home, there seem to be some problems when globbing additional files, see here and here.
Summary
All the pieces of the puzzle are now put together:
- implementing the generator
- using a custom
OutputWriter
to get a string fromXmlSchemaClassGenerator
- working around the System.ComponentModel.Annotations issue
- reading custom properties from
AdditionalFiles
to configure the generator
You can see the whole generator sample on my github. Thanks again to Michael Ganss from XmlSchemaClassGenerator for doing the actual work and enabling me to piggyback on it with this source generator.