View comments | RSS feed

Using character classes

In character sets within regular expressions, you can include a character class. You enclose the character class inside square brackets, as the following example shows:

REReplace ("Macromedia Web Site","[[:space:]]","*","ALL")

This code replaces all the spaces with *, producing this string:

Macromedia*Web*Site

You can combine character classes with other expressions within a character set. For example, the regular expression [[:space:]123] searches for a space, 1, 2, or 3. The following example also uses a character class in a regular expression:

<cfset IndexOfOccurrence=REFind("[[:space:]][A-Z]+[[:space:]]", 
   "Some BIG string")>
<!--- The value of IndexOfOccurrence is 5 --->

The following table shows the character classes that ColdFusion supports. Regular expressions using these classes match any Unicode character in the class, not just ASCII or ISO-8859 characters.

Character class Matches

:alpha:

Any alphabetic character.

:upper:

Any uppercase alphabetic character.

:lower:

Any lowercase alphabetic character

:digit:

Any digit. Same as \d.

:alnum:

Any alphanumeric character. Same as \w.

:xdigit:

Any hexadecimal digit. Same as [0-9A-Fa-f].

:blank:

Space or a tab.

:space:

Any whitespace character. Same as \s.

:print:

Any alphanumeric, punctuation, or space character.

:punct:

Any punctuation character

:graph:

Any alphanumeric or punctuation character.

:cntrl:

Any character not part of the character classes [:upper:], [:lower:], [:alpha:], [:digit:], [:punct:], [:graph:], [:print:], or [:xdigit:].

:word:

Any alphanumeric character, plus the underscore (_)

:ascii:

The ASCII characters, in the Hexadecimal range 0 - 7F


ColdFusion 9 | ColdFusion 8 | ColdFusion MX 7 | ColdFusion MX 6.1 | ColdFusion MX | Forums | Developer Center | KnowledgeBase | Bug Reporting

Version 7

Comments


campem said on Jul 2, 2007 at 12:29 PM :
The docs for the character classe :alnum: defines it as:
:alnum: = Any alphanumeric character. Same as \w.

The docs for the escape sequence \w defines it as:
\w = Any alphanumeric character, similar to [[:alnum:]]

Nice circular reference. Plus is "similar", exactly the "same" or not? Plus, there's a character class :word: defined as:
:word: = Any alphanumeric character, plus the underscore (_)

I would expect \w and :word: as equivalent since that's how most regex implementations behave... which is not what the definitions imply. Thought I'd bring this up for clarity's sake, thanks for taking som time to address this issue.

While I'm at it, will it ever be possible to specify the greediness of a pattern as in "\S*?"
halL said on Jul 2, 2007 at 1:23 PM :
The documentation is incorrect.
I tested this in a version later than 7, but I believe this code is unchanged:

\w and [[:word:]] have the same effect; they match a-zA-Z0-9_.

[[:alnum:]], as indicated by its name, does not match the underscore. only a-zA-Z0-9.

 

RSS feed | Send me an e-mail when comments are added to this page | Comment Report

Current page: http://livedocs.adobe.com/coldfusion/7/htmldocs/00000988.htm