Tuesday, October 09, 2007

Aliases - The art of readable regular expressions

Here's a simple question.. Which of these is more readable?

  • (\.?((\[(([A-Z]|[a-z]|([À-Ö]|[Ø-ö]|[ø-ə]))|_)
    ((([A-Z]|[a-z]|([À-Ö]|[Ø-ö]|[ø-ə]))|_)|[0-9])*
    (\u0025|&|@|!|#|\$)?\])|(([A-Z]|[a-z]|([À-Ö]|[Ø-ö]
    |[ø-ə]))|_)((([A-Z]|[a-z]|([À-Ö]|[Ø-ö]|[ø-ə]))|_)
    |[0-9])*(\u0025|&|@|!|#|\$)?)((\s*)\.(\s*)((\[(([A-Z]|[a-z]|([À-Ö]
    |[Ø-ö]|[ø-ə]))|_)((([A-Z]|[a-z]|([À-Ö]|[Ø-ö]
    |[ø-ə]))|_)|[0-9])*(\u0025|&|@|!|#|\$)?\])|(([A-Z]|[a-z]|([À-Ö]
    |[Ø-ö]|[ø-ə]))|_)((([A-Z]|[a-z]|([À-Ö]|[Ø-ö]
    |[ø-ə]))|_)|[0-9])*(\u0025|&|@|!|#|\$)?))*)


  • %Identifier%


I'll give you a minute to let that sink in......


The 2 expressions above are both valid and equivalent in DXCore's 'Alias' subsystem. The 'alias' subsystem backs the 'Duplicate Line', 'Selection Inversion" and "Intelligent Paste" functions of CodeRush.


Which would you rather type?


The 'Alias' subsystem is a very simple but very powerful recursive substitution system which starts with a simple phrase. (In this case %Identifier%) and then expands it according to 1 or more rules. The resultant phrase is then analyzed and, if capable, expanded again... and again ... and again.


%Identifier%

...becomes...

(\.?%SimpleIdentifier% (%ws%\.%ws%%SimpleIdentifier%)*)

...becomes...

(\.?(%EscapedIdentifier%|%NonEscapedIdentifier%) (%ws%\.%ws%(%EscapedIdentifier%|%NonEscapedIdentifier%))*)

...becomes...

(\.?((\[%AlphaOrUnderline%%PartialIdentifier%%TypeCharacter%?\]) |%AlphaOrUnderline%%PartialIdentifier%%TypeCharacter%?) (%ws%\.%ws%((\[%AlphaOrUnderline%%PartialIdentifier%%TypeCharacter%?\]) |%AlphaOrUnderline%%PartialIdentifier%%TypeCharacter%?))*)

A few rounds later and we have our full expansion....


(\.?((\[(([A-Z]|[a-z]|([À-Ö]|[Ø-ö]|[ø-ə]))|_)
((([A-Z]|[a-z]|([À-Ö]|[Ø-ö]|[ø-ə]))|_)|[0-9])*
(\u0025|&|@|!|#|\$)?\])|(([A-Z]|[a-z]|([À-Ö]|[Ø-ö]
|[ø-ə]))|_)((([A-Z]|[a-z]|([À-Ö]|[Ø-ö]|[ø-ə]))|_)
|[0-9])*(\u0025|&|@|!|#|\$)?)((\s*)\.(\s*)((\[(([A-Z]|[a-z]|([À-Ö]
|[Ø-ö]|[ø-ə]))|_)((([A-Z]|[a-z]|([À-Ö]|[Ø-ö]
|[ø-ə]))|_)|[0-9])*(\u0025|&|@|!|#|\$)?\])|(([A-Z]|[a-z]|([À-Ö]
|[Ø-ö]|[ø-ə]))|_)((([A-Z]|[a-z]|([À-Ö]|[Ø-ö]
|[ø-ə]))|_)|[0-9])*(\u0025|&|@|!|#|\$)?))*)


At each round of the substitution you can see the power of  Aliasing (not to mention the pre-built list of aliases) at work.


And what a list it is..... 93 separate pre-built aliases found in the "Core\Aliases\Regular Expressions\System" section of the DXCore options.


If  you find, somehow, that these aren't enough you can create you own. DevExpress have even allocated a place for you to do just this in "Core\Aliases\Regular Expressions\User"


But it doesn't stop there. Each defined alias can be referenced either on it's own, or with a numeral suffix from 1 to 9. Thus you can have %Identifier%, %Identifier1%, %Identifier2% .. all the way through to .. %Identifier9% all used in the same expression.


This paves the way for some pretty powerful features


In my next couple of posts I'll show you how we take advantage of aliases in our use of "Duplicate Line", "Smart Paste" and "Selection Inversion".


No comments: