Wednesday, May 17, 2006

Code Generation

A post on code generation in this case via MyGeneration and some of the clever things can be done with it :)
I listen to several podcasts. For a complete list see my sidebar. One of these is HanselMinutes
A weekly audio talk show with noted web developer and technologist Scott Hanselman and hosted by Carl Franklin

In reality what this means is that Scott has spend a great deal of time dealing with technology of varying kinds and brings us this weekly show with Carl Franklin (of DotNetRocks fame.) and shares with us his knowledge regarding tools utilities tips and tricks relating to .. well .. almost anything.

Scott recently did a podcast (ok so it was a month or more ago) on Code Generation in which he mentioned several code generation options and rated their capabilities and effectiveness.

My Generation
As Scott himself points out during the course of his show, there are far more Code-Generation products available then he could possibly have covered during his show.

One that was missed by Scott, but found by my good friend Marc Croom, was MyGeneration a freeware application/framework build around the dotnet framework.

MyGeneration claims to be capable of everything that CodeSmith is and has the added bonus that it's free (although not open-source).

Naturally (given a free framework some spare time to play with) we have begun to delve into the realms of Code-Generation and have been trying to decide just how we can take advantage of it.

The idea seems to be that you code up a script/template (in VB.Net/C#/VBA or JavaScript(I think)) and then this script is made to run, generating the code you requested.

MyGeneration comes complete with a framework for reflecting over an existing database structure. So you can create a class per Table/View/StoredProc which can contain constants derived from the Table names, Field names, parameter names and code derived from a combination of these and code pre-build into the template. (Of course these are ideas not limits)

Inside a couple of hours we (Marc and myself) have managed to create a few simple templates which we were then able to use to completely regenerate the existing DAL (Data Access Layer) of one of our company's main applications.

The application in question has, at last count, 122 tables, and 45 views.

Classes for all 167 of these database objects including Select/Insert/Update/Delete methods (obviously where appropriate), relevant fieldname constants and several lazily loaded properties were generated following this in less than 30 seconds

And that was just from the meta data gathered from the database itself.

I can insert this generation of code into my build procedure to ensure that I get compile errors if I change the database to the point where old code cannot work against it.

For example
If I delete a field the code generation will regenerate without that field constant and any code that references it will fail to compile.

In the old days, (lol I'm not that old :)) we would have "hand coded" constants to represent the fields of a database. These constants would have been compiled into the code and would not have manifested problems until runtime.
So here we fail as early as possible moving as many errors as we can from runtime to compile-time.

Hey here's an interesting idea. Hows about we generate assertions for the existence of each and every field and table in our entire data structure which we can run upon application startup in order to determine for certain that we are running against the correct version of the database.

The Next Step
In the current situation, "The Database" is the domain language. In other words the database contains all the information used to perform the generation of code. This is great for simple needs, like mine are currently, but what if later I need to express exceptional circumstances to my templates?

I plan on taking a leaf out of Scott's book and trying the following.

I'm going to try to use MyGeneration to generate, not code, but an XML representation of the schema of my database.

I'm then going to see if I can't knock up an assembly or 2 to help MyGeneration reflect in a strongly typed fashion over the generated XML and see if it's can't be made to generate the SQL Schema from the XML

Initially this might seem like a strange idea, but there is method to my madness.

You see in doing this, I will have shifted my Domain Language from the Database to the XML.

The reason for this, is that XML is capable of having meta-bits added to express things that the database cannot. So I can markup my database XML definition with business information which should enable my code generation to make sensible decisions.

For example
I would like to generate a DAL for my database. But there are several tables for which I would like to generate further logic.

I have a Customer table and a Product table and I would like to indicate to the generation system that it should generate Domain Objects for these tables.

So I would actually like the system to generate Customer and Product objects which would naturally have intellisense access to properties which represent the fields in the table. Further I would like to generate collection classes for these new Domain Objects and finally I would like the DAL for these objects to return collections of these objects rather than Rows, Datasets or Datareaders. So now my results are strongly typed, iterable (not sure that's a word) and I can inherit/amend these to add business-logic.

There is no easy way to markup the database itself so that the templates can understand that these tables are special but I should be able to markup the XML in any way I see fit making this an almost trivial task.

And finally...
There would seem to be many potential uses for Code-Generation. A few more I have either thought of myself or have had suggested by others include

Unit Tests for generated code
XML Documentation
AJAX Style Javascript Functions that map to existing server-side functions

If you can think of any more, then please feel free to let me know in the comments. I'm very interested to know what people are doing in this space.


Justin Greenwood said...

Just a note about MyGeneration's ability to store meta-data with entities in a databases schema. Check out the User and Global Meta-Data features of MyMeta. You can basically associate name-value pairs with any MyGeneration entity, setup aliases for fields and tables, etc. All of that info is easily accessable through the MyMeta intrisic object in a template. I use this all the time to tell my templates to behave in one way or another when generating the code for that object. You could have a "IsDomainObject" key, or "GenerateHierarchicalChildren", or whatever you want. These settings are stored in an XML file that is loaded by the MyMeta object during initialization.

Rory said...

Sorry about the taking so long to reply.

The User and Global Meta-Data sound very interesting. I will definately take a look at this.

However doesn't that mean Dual Domain Languages? (Database and XML addendum)

Doesn't that make things inherently harder to maintain?

I thought the idea behind a domain language was to have "one specification to rule them all" :)

Kinda like a case tool.

Of course such a language would need to be able to express everything about the domain in a sensible way. already I see that my XML theory will probably need to express a Body of either a StoredProc or a View as a CData section and this is probably imperfect.

I can see that what you suggest might get someone up and running very quickly but isn't it cleaner to be able to generate your database, fill it with imported data and perhaps be able to run unit tests against it as well.
All as a part of a nightly build.

Can you give an example (other than initial speed of development) of where the using the DB (with XML) as a domain language gives an advantage over the suggested method?

Justin Greenwood said...

I guess I don't see the point in creating yet another layer between the database, the Generic Meta-Data API (MyMeta), and the Generated Code. I am not really a big believer in having huge XML files describing the database, or describing specific application logic. It can work, but I'd much rather have it in compiled code. It just adds yet another place where runtime errors can be introduced. Configuration is a good place to use XML, but not business logic or database structure. If it's something that can only be done by developers and is part of the development process, put it in the code. If it's something that is configurable after deployment by the user, then it should be in the registry or an XML/INI file. The database and source code are so tightly related in most almost every case that a structural database change almost always means a code change and recompilation. (even with xml based Hibernate)
I think a better way to implement a domain design would be through code. I think that's sortof what generating a DAL is all about. The domain is really the front end application code that is abstracted away from the complexities of database access and business logic provided by the partially generated and manually tweaked DAL. I guess our main disagreement is the practicallity of XML and not the concept of Domains.