Av rating:
Total votes: 30
Total comments: 11


Nick Harrison
A Gentle Introduction to .NET Code Generation
28 May 2009

Code-generation has been used throughout the age of the digital computer. The use of code to generate code might, at first glance, seem an odd thing to want to do, but the technique is alive and well, and is widely used in .NET.  Nick Harrison explains, and introduces the CodeDom....

Why Generate Code?

I have a dream that one day the computer will write its own code.  Sure, this may be farfetched but why not?

There are however certain types of code that are more sensibly generated by the computer.  We should be able to explain what the code should look like, how it should be written, but leave its' generation to the computer.  Code that is tedious and error prone is better left to computers that thrive on tedious tasks.  Code that requires creative innovative thought is better left to creative programmers.  By letting the computer handle the tedious code, we are free to focus on the more creative aspects of coding.

There is also a great deal of code where there aren't yet any established best practices.  This is also typically very tedious code, but worse still, this code may have to be updated to reflect new best practices.  Several examples quickly come to mind, such as  Data access logic, Web Service wrappers, and XML proxies.  They all have rather tedious implementations that easily conform to a well-defined pattern.  This makes them ideal candidates to be generated instead of hand written.

Code Generation: an Overview

In the DotNet world, there are two basic schools of thought for generating code.  One is template driven.  The other relies on the objects in the CodeDom namespace.  Each approach has its own strengths and weaknesses.

Products like Tier Builder, Code Smith, My Generation etc are based on the template approach.  This is similar to XSLT.  You pass metadata about the code that you want to generate through your templates and the various engines will produce code based on the template.  You end up having to learn a new language for the templates, but the templates make it relatively easy to describe the code that you want to have generated

Tools like the WSDL code generator that produces web server proxies and strongly typed data sets rely on the objects in the CodeDom namespace to generate their code.  The biggest advantage with the CodeDom is its language independence.  Once you have expressed your program logic in terms of the CodeDom objects, CodeDom can output code in any DotNet language. 

Regardless of the approach you chose, there is a golden rule that must always be followed when dealing with generated code:

Never directly edit generated code.  Modify a derived class or modify a separate file in a partial class

The rest of this article will focus on the CodeDom approach to code generation

CodeDom: an Overview

The CodeDom namespace includes objects to represent nearly every language construction in a language-independent fashion; I say nearly every language construct because there are key areas missing.  We will go over these areas a little later. 

To use the CodeDom, we need to create a language tree populated by these objects.  Each DotNet language provides a CodeProvider, which can generate code from this language tree.

The process of creating such a language tree is similar to diagramming sentences from grammar class.   Do you remember diagramming sentences?

Every word in a sentence was assigned its part of speech and placed appropriately on a diagram.  I loved these exercises.  CodeDom takes you through a similar exercise mapping the components of a language structure back to CodeDom objects.

Some statements are easily to map:

   match = false;

 This statement easily maps back to a simple CodeAssignStatement. The Left is a simple CodeVariableReferenceExpression and the Right is a CodePrimitiveExpression.

Other statements may be more difficult to pull apart:

  returnValue = DateTime.Parse (viewState[index].ToString());

 

This statement is also a CodeAssignStatement.  The Left is still a simple CodeVariableReferenceExpression, but the Right gets a bit more complicated.

  DateTime.Parse (viewState[index].ToString());

Here we have CodeMethodInvokeExpressionDateTime.Parse is a straightforward CodeMethodInvokeExpression, but the “Parameters” is a little harder to pull apart.  We have only one parameter:

 viewState[index].ToString()

This is a CodeMethodInvokeExpression.  This time the “Parameters” is simple.  There are none, but the CodeMethodReferenceExpression has some complications.   The TargetObject property for the CodeMethodReferenceExpression is more complicated than we typically see:

 viewState[index]

The TargetObject is a CodeArrayIndexerExpression.  The TargetObject for the indexer is a CodeVariableReferenceExpression to the viewState variable.  The Indices will also be a CodeVariableReferenceExpression, this time to the index variable.

Pulling it all together, the CodeDom to produce a statement like:

  returnValue = DateTime.Parse (viewState[index].ToString());

Might look like this:

    CodeAssignStatement assignToReturnValue = new CodeAssignStatement();

    assignToReturnValue.Left =

        new CodeVariableReferenceExpression("returnValue");

    CodeMethodInvokeExpression invokeDateTimeParse =

        new CodeMethodInvokeExpression();

    CodeMethodReferenceExpression referenceDateTimeParse = 

        new CodeMethodReferenceExpression ();

    referenceDateTimeParse.TargetObject =

        new CodeVariableReferenceExpression ("DateTime");

    referenceDateTimeParse.MethodName = "Parse";

    invokeDateTimeParse.Method = referenceDateTimeParse;

    CodeIndexerExpression indexViewState =

        new CodeIndexerExpression();

    indexViewState.TargetObject =

        new CodeVariableReferenceExpression("viewState");

    indexViewState.Indices.Add (

        new CodeVariableReferenceExpression("index"));

    CodeMethodInvokeExpression callToString =

        new CodeMethodInvokeExpression();

    callToString.Method =

        new CodeMethodReferenceExpression(indexViewState, "ToString");

    invokeDateTimeParse.Parameters.Add(callToString);

    assignToReturnValue.Right = invokeDateTimeParse;

Why Use the CodeDom?

May people may look at the above code sample and conclude that it looks like too much work.  This is rather verbose taking 12 lines of code to generate one line.  From the outside, it hardly looks like a good return on investment.

But there are some great benefits making the steep learning curve worth the effort.

Breaking your logic up into its base parts is a great way to learn about what you are writing.  The same way that diagramming sentences in school helped you understand English grammar, expressing your logic in terms of CodeDom objects helps you understand the logic you write.  You become more attuned to duplicated code.  You gain greater insight into the patterns in your code.

Sometimes the code that you want to generate cannot be expressed in a template.  This may require you to use CodeDom.  Consider a code generator to build regular expressions based on the format of a fixed length record.  Such a code generator is rather easy to specify in CodeDom but surprisingly difficult as a template.

CodeDom is also a self contained solution.  With template code generation, you need to version the templates, and the template engine as well as your metadata to be able to regenerate the code when needed.  Because CodeDom is self contained this versioning is simplified

Finally the CodeDom is language independent.  Once you have defined your logic in terms of a CodeDom graph, you can then output your logic in any DotNet language.

What’s wrong with the CodeDom?

All of this is not to say that there is nothing with the CodeDom.  There are many critics and they raise some valid concerns.

We have already seen that the code needed can be very verbose.  This scares many people away, but   there are parsers and libraries to help lessen this impact.  Refly is one such library. Refly lets you operate at a slightly higher level of abstraction and can tremendously lessen the learning effort.

There are certain language constructs that are missing.  Some of these language constructs you could arguably say should never be used in the first place, others are truly frustrating.  There is klutzy at best syntax for a expressing foreach type loop. 

There are a handful of operators that are annoyingly missing.  Most of the Binary Bit operators are missing, (LeftShift (<<), RightShift (>>),UnsignedRightShift (>>>),ExclusiveOr (^))  All Unary operators are missing as well.  So much for ++ or --.

The missing operators are annoying, but this can worked around fairly easily by calling a helper method in a library class.

One annoyance that I have yet to find a reasonable workaround for is that you cannot attach custom attributes everywhere that you would like.  Most problematic is not being able to attach custom attributes to the get and set of a property.  I want to attach attributes to generated properties directing the debugging to not step into them.

It used to be said that the Laser was a tool looking for a use.  Many people feel the same way about Code Generation in general.  Many developers may view a code generator as a threat to their job.  Others have had bad experiences with code generation in the past and are now apprehensive about using one.

Code generators will not take work away from developers.  Every compiler is essentially a code generator keeping us from having to write in binary.  Effective use of code generation frees developers from getting bogged down in tedious error prone code and allows us to focus on more innovative ways to solve real world problems.  Code generation done properly will not box you into a corner where you are not able to make any changes to the system.  Code generation done properly should never force you to maintain “ugly” generated code.  Code generation done properly should reinforce the need for solid object oriented design, not detract from it.

The key to effective code generation is the golden rule mentioned earlier:

Never directly edit generated code.  Modify a derived class or modify a separate file in a partial class

This requires effective object oriented design.  Not modifying the generated code means that we really don’t care about what the code looks like.  We should never look at it.  If we want to change what the generated code does, we need to change the metadata that was used to generate the code or modify a derived class.  In fact, I will often take the generation a step further and compile the generated code so that there is no opportunity to modify it.  It can be viewed only through tools like Reflector.

That being said, the language providers in general produce pretty readable well formatted code.

Basic Tricks

Almost any code generation exercise will employ some common tasks.  Let’s  review how to create the skeleton for a class, how to create the LanguageProvider, , and how to render generated code.

Class Skeletons out of the Closet

At a high level, a class skeleton can be thought of as:

ü  NameSpace

§  Type

o   Properties

o   Methods

 

public CodeNamespace CreateNameSpace (string nameSpaceName)

{

    CodeNamespace returnValue =

        new CodeNamespace(nameSpaceName);

    returnValue.Imports.Add

        (new CodeNamespaceImport ("System"));

    returnValue.Imports.Add

        (new CodeNamespaceImport("System.Text"));

    returnValue.Imports.Add

        (new CodeNamespaceImport("CustomNameSpace"));

    return returnValue;

}

Note the CodeNameSpaceImport objects.  We simply list out the namespaces that we want to use or import for VB.  Note that we only give the name of the namespace.   The individual languages handle getting  the syntax right.

public CodeTypeDeclaration CreateType (string name)

{

    CodeTypeDeclaration returnValue = CodeTypeDeclaration (name);

    returnValue.IsClass = true;

    returnValue.BaseTypes.Add (new CodeTypeReference (typeof (Object)));

    returnValue.BaseTypes.Add (new CodeTypeReference ("IInterface"));

    return returnValue;

}

 

The BaseTypes is a collection even though all DotNet languages will support only a single base class.  The first entry in the collection will be the base class.  The subsequent entries will be the interfaces implemented.   VB and CSharp have different syntax for specifying the base class and any interfaces that an object is implementing.  The VB language provider will require that the first entry will be the base class. This causes problems because VB uses different syntax for the base class and the interface.   If you are implenting an interface, you should specify a base class just in case the code needs to be rendered in VB.  If you are implementing an interface and do not have a natural base class, explicitly specify System.Object. With this simple work around, VB and CSharp will both generate correct code.

public CodeMemberMethod CreateMethod(string methodName)

{

    // Declare a new CodeEntryPointMethod

    CodeMemberMethod returnValue = new CodeMemberMethod();

 

    // Specify that this method will be both static and public

    returnValue.Attributes = MemberAttributes.Public;

    returnValue.Name = methodName;

    returnValue.ReturnType = new CodeTypeReference(typeof (void));

    returnValue.CustomAttributes.

      Add(new CodeAttributeDeclaration("Any  Attribute"));

    // Return the freshly created method

    return returnValue;

}

 

The logic for the method will be housed in the CodeStementCollection exposed as Statements.  Any logic not expressly dependent on the metadata should be defined in a base class.

public CodeMemberProperty CreateProperty(string Name, Type propertyType)

{

    CodeMemberProperty returnValue = new CodeMemberProperty();

    returnValue.Attributes = MemberAttributes.Public;

    returnValue.Name = Name;

    returnValue.Type = new CodeTypeReference(propertyType);

    returnValue.GetStatements.Add(

        new CodeMethodReturnStatement(new

        CodeVariableReferenceExpression("m" + Name)));

    returnValue.SetStatements.Add(new CodeAssignStatement(

           new CodeVariableReferenceExpression("m" + Name),

           new CodeVariableReferenceExpression("value")));

    return returnValue;

}

Properties represent one of the most annoying problems that I have with the CodeDom.  You cannot attach custom attributes to the Get and Set methods of a property.   I would like to attach DebuggerHidden attributes here.

Putting it all together

We pull it all together by adding out generating type to the Types collection of the namespace, and adding out generated Method and Property to the Type.

public CodeNamespace GenerateSkeleton ()

{

    CodeNamespace returnvalue = CreateNameSpace("GeneratedCode");

    CodeTypeDeclaration generatedType =  CreateType("SimpleType");

    returnvalue.Types.Add(generatedType);

    generatedType.Members.Add(CreateMethod("SimpleMethod"));

    generatedType.Members.Add(CreateProperty("BasicProperty", typeof (int)));

    generatedType.Members.Add(

new CodeMemberField(typeof (int), "mBasicProperty"));

    return returnvalue;

}

Where’s My Provider

Every DotNet language defines a language provider that gives us access to the Generator and Compiler for that language.  Each of these derives from System.CodeDom.Compiler.CodeDomProvider.

 

public CodeDomProvider CreateProvider(string Language)

{

    if (Language == "VB")

    {

        return new  Microsoft.VisualBasic.VBCodeProvider();

    }

    else

    {

        return new Microsoft.CSharp.CSharpCodeProvider ();

    }

}

 

You can easily specify your favorite DotNet language here.

Rendering the Fruits of Our Labor

Once we have the appropriate provider, we are ready to generate code.  My preferred approach is to attach a StringBuilder to StringWriter and pass that into the Generate method.   Once the code is generated, we can retrieve the generated code from the StringBuilder.  You could also use a StreamWriter to write the code directly to a file on disk.

 

public static string GenerateCode(string language,

    System.CodeDom.CodeNamespace sourceCode)

{

    CodeDomProvider provider = CreateProvider(language);

    System.CodeDom.Compiler.ICodeGenerator codeGenerator = null;

 

    System.CodeDom.Compiler.CodeGeneratorOptions codeGeneratorOptions =

        new CodeGeneratorOptions();

    codeGeneratorOptions.BlankLinesBetweenMembers = false;

    codeGeneratorOptions.BracingStyle = "C";

    codeGeneratorOptions.IndentString = "   ";    // 3 spaces.

    codeGenerator = provider.CreateGenerator();

    System.Text.StringBuilder code = new StringBuilder();

    System.IO.StringWriter stringWriter = new StringWriter(code);

    codeGenerator.GenerateCodeFromNamespace (sourceCode,

        stringWriter, codeGeneratorOptions);

    return code.ToString();

}

 

Output

With such a generator, our skeleton class will look like this:

For VB:

Imports System

Imports System.Text

Imports CustomNameSpace

Namespace GeneratedCode

    Public Class SimpleType

        Inherits BaseClass

        Implements IInterface

 

        Private mBasicProperty As Integer

 

        Public Overridable Property BasicProperty() As Integer

            Get

                Return mBasicProperty

            End Get

            Set(ByVal value As Integer)

                mBasicProperty = value

            End Set

        End Property

 

        <AnyAttribute()> _

    Public Overridable Sub SimpleMethod()

        End Sub

    End Class

End Namespace

For C#:

public class SimpleType : BaseClass, IInterface

{

   

    private int mBasicProperty;

   

    public virtual int BasicProperty

    {

      get

      {

        return mBasicProperty;

      }

      set

      {

        mBasicProperty = value;

      }

    }

   

    [AnyAttribute()]

    public virtual void SimpleMethod()

    {

    }

}

This skeleton is the pattern that I often follow while generating code.

You can use this skeleton, adding properties for every column in a table and you will have the beginnings of a very useful Business Entity Generator.

Conclusion

Code generation takes us the next step to being able to describe to the compute what the code should look like and letting the computer write its own code.  In essence, every compiler is a code generator generating machine code.

The Code Generation is a powerful technique allowing us to focus on more interesting, exciting components, and allowing the computer to handle the tedious details.  With DotNet, you have two options for generating your code.  Depending on your needs, template driven generation may be ideal, or the CodeDom may provide the better solution.  This is just another tool in your toolbox.



This article has been viewed 9149 times.
Nick Harrison

Author profile: Nick Harrison

Nick Harrison is a Software Architect and .NET advocate in Columbia, SC. Nick has over 14 years experience in software developing, starting with Unix system programming and then progressing to the DotNet platform. You can read his blog as www.geekswithblogs.net/nharrison

Search for other articles by Nick Harrison

Rate this article:   Avg rating: from a total of 30 votes.


Poor

OK

Good

Great

Must read
 
Have Your Say
Do you have an opinion on this article? Then add your comment below:
You must be logged in to post to this forum

Click here to log in.


Subject: Sentence Diagramming
Posted by: Anonymous (not signed in)
Posted on: Friday, May 29, 2009 at 7:36 AM
Message: Yeah for diagramming sentences. You sure know your stuff!

www.english-grammar-revolution.com

Subject: Imports don't seem to affect the full names of types
Posted by: Dewy (view profile)
Posted on: Monday, June 01, 2009 at 10:16 AM
Message: I have written a Code Generation utility and it all works fine. Because I have attributes that use Enum values in them, the code is quite verbose when generated. I tried adding an Import statement in the hope that the generator would be smart enough to then only use as much of the full name that was necessary. I guess that it doesn't because there maybe name clashes, but is there anyway to force this? Its no big deal if not as it is a generated class.

Cheers
Dewy

Subject: Nice!
Posted by: KeithFletcher (view profile)
Posted on: Monday, June 01, 2009 at 6:45 PM
Message: Good article! This is a very under-utilised aspect of programming, and one that I'm very interested in. I hope you're going to follow this up with a few 'less gentle' posts?

Subject: *sigh*
Posted by: Mak Wallace (not signed in)
Posted on: Tuesday, June 02, 2009 at 1:17 AM
Message: What's really worth remarking is that I ended up skimming through the code blocks and ignoring the text, because the text is so badly written that its annoyance factor greatly outweighs any benefit that might be derived from it (note, for example, the lack of apostrophes, anywhere near the word "its" -- which you somehow managed to *Pluralise*!)

Here's a tip: Don't use a computer to diagram sentences; buy a grammar book.

Subject: Re: Imports don't seem to affect the full names of types
Posted by: Anonymous (not signed in)
Posted on: Tuesday, June 02, 2009 at 10:41 AM
Message: When doing a CodeTypeReference try using the overloaded constructor that accepts a string instead of a Type. If all you pass in is a string, it cannot expand that out to the full namepsace for the type. You should be able to specify how ever much you want included and not have the full namespace included.


Subject: Don't touch CodeDom
Posted by: Bugblatter (not signed in)
Posted on: Wednesday, June 10, 2009 at 5:31 PM
Message: Thanks for the article but I feel I have to weigh-in here.

I've written two very large code generation systems. The first used CodeDom and progress was so slow that I ended up using it as little more than a StringBuilder. Append append append...

Couldn't even do while loops; had to do a for that'd never end and then have a conditional break inside it. I don't know if they ever added while loops.

And don't even think about nesting unless you enjoy manually separating grains of salt from grains of pepper.

I got there in the end, and it was doing more than Linq2Sql before that was even announced, but by god it was painful and maintenance was a nightmare.

The system was a great success and is still used daily by several companies. By god it's ugly behind the scenes though.

For the second system I wrote a template-based system using Regexes and delegates (the code gen system didn't have to know what to replace tags etc. with, it got passed a delegate that knew how to do it). That was orders of magnitude faster to develop and maintenance was far far simpler.

The biggest problem with CodeDom is that it's from Microsoft. They have a habit of massively over-engineering everything. It's so abstracted that it's almost unusable. Yes it can generate any .NET language, but who cares? You won't be editing it, and there's no problem mixing .NET languages anyway.

Take the advice of someone who's been through it and written major code generation systems using both methods. Unless you absolutely positively have to be able to generate in multiple languages don't touch CodeDom. Microsoft may have thousands of man hours to throw at these things but I'm guessing you don't.

Subject: Takes it to a whole new level.
Posted by: Tom Brown (view profile)
Posted on: Friday, June 19, 2009 at 7:10 AM
Message: I've written code generators, first in shell script & awk, then in VBScript - finally transferred to .NET Macros (VBScript), so I'm interested in this CodeDom. However my previous efforts were designed to just get you started with a usable skeleton template to then customize. Its fast, easy to modify, and doesn't have to create perfect code. You need to spend so much more time and effort to create a code-generator that works without some tweaks to the output. I'm interested, so I'm going to try the CodeDom anyway, despite the previous guy's comments, but I'm grateful for the warning.

Subject: There are two sides to code-generation...
Posted by: Sean Fowler (view profile)
Posted on: Tuesday, June 23, 2009 at 6:27 PM
Message: There are two sides to code-generation. Firstly there's how you generate the code; secondly there's the code you generate.

CodeDom won't help you to generate good code anymore than CodeSmith etc will (unless you use one of their templates).

In my experience it was far faster to write my own code generator than to use CodeDom. Alternatively I could have used CodeSmith but I don't think I'd have had enough flexibility. You do get flexibility with CodeDom, I'll give it that.

If you're interested you can download the trial version of my non-CodeDom system (Foundation) here: http://www.theita-team.co.uk/Foundation.html

It includes text files which are the templates for the code it generates. I won't pretend that it's the most 'pure' or functional code generator I could have written; the goal was the generation of specific code rather than the creation of a general code generation system.

I haven't done any work on it for a couple of years so the generated code may have some compatibility issues on VS2008, but it works fine on VS2005.

Anyway the choice is yours. Writing Foundation was the most challenging project of my career, and writing it single-handedly was very satisfying. I can now show that to any potential employer and if they know their stuff they'll be damned impressed.

It took me 5-6 months to write Foundation; I don't know if you're planning something on that scale but if you are then I wish you every success, and remember to enjoy the challenge :o)

Subject: Oh I forgot to mention...
Posted by: Sean Fowler (view profile)
Posted on: Tuesday, June 23, 2009 at 6:28 PM
Message: I forgot to mention that I'm Bugblatter, in case you hadn't guessed :o)

Subject: What about T4?
Posted by: skelly (view profile)
Posted on: Monday, September 14, 2009 at 1:03 PM
Message: If you're talking about .NET Code Generation -- investigate T4 - Text Template Transformation. This is the engine that is built into the VS 2008 IDE, and is also friendly with MVC.

<A href='http://www.aspapp.com/Content.aspx?ContentId=622'>T4 Code Generation in .NET</A>

Subject: Other tools
Posted by: icode.cs (view profile)
Posted on: Sunday, November 29, 2009 at 8:34 PM
Message: If anyone is interested they should check out MyGeneration. It's a great tool. The scripts are much more straight forward and they provide a customizable UI per each script so you can input your variables. Also I use snippets a lot for the redundant stuff.

Just some thoughts.


 






recommended site pinvoke

PInvoke.net is a user-driven wiki which provides .NET developers with native method signatures, so they don't have to spend time writing them from scratch.




Has .NET Reflector Saved Your Bacon?
 We think Reflector is a fantastic tool, and we know you do too. We'd love to hear about the times... Read more...

The Managed Heap
 Because Red-Gate's .NET team works closely with the users of their products in order to try to fit the... Read more...

Using Three Flavors of LINQ To Populate a TreeView
 LINQ is a valuable technology. LINQ to XML, LINQ to Objects and LINQ to XSD, in particular, can save... Read more...

How to build a Query Template Explorer
 Having introduced his cross-platform Query Template solution, Michael now gives us the technical... Read more...

How to Create Event Receivers for Windows SharePoint Services 3.0
 You'll be surprised how often that you'll use event receivers instead of Workflow in order to implement... Read more...

A Complete URL Rewriting Solution for ASP.NET 2.0
 Ever wondered whether it's possible to create neater URLS, free of bulky Query String parameters?... Read more...

Visual Studio Setup - projects and custom actions
 This article describes the kinds of custom actions that can be used in your Visual Studio setup project. Read more...

.NET Application Architecture: the Data Access Layer
 Find out how to design a robust data access layer for your .NET applications. Read more...

Web Parts in ASP.NET 2.0
 Most Web Parts implementations allow users to create a single portal page where they can personalize... Read more...

Configuring Forms Authentication in SharePoint 2007
 Damon Armstrong provides a step-by-step guide to the processes, quirks and pitfalls of setting up... Read more...

Over 150,000 Microsoft professionals subscribe to the Simple-Talk technical journal. Join today, it's fast, simple, free and secure.

Join Simple Talk