Regular Expression usage within VBScript and Application Center Test (ACT)


Regular Expression usage within VBScript and Application Center Test (ACT)

Article Purpose

The purpose of this article is to explain the use of VBscript Regular Expressions within ACT, in order to facilitate URL Rewriting (including Session handling) and HTTP body (page) verification. These objectives are now defined, actual examples of these objectives are found later in the article.

HTTP body (page) verification

A text check that is made on the returned Body of a Request in order to verify the correct content. Although the primary usage of ACT is Load Testing, ACT can also be used for functional testing. Verifying the HTTP Response is essential for functional testing and useful for Load Testing to check the integrity of the Web Server under load.

URL Rewriting

The HTTP protocol is stateless so any state information (about a given User Session) needs to be handled by either Cookies or URL Rewriting. The term URL Rewriting refers to a technique (to manage state over HTTP) that involves (the Web Server App) dynamically creating a Web Page that contains a variable within the links of the Page. In this way when a user clicks the (dynamically) created link, the Server Application can retrieve the variable information. Two common uses of URL Rewriting are for Session Handling (in order to key to server side information, e.g. shopping carts) and the asp __VIEWSTATE variable which stores user View' controls information. The issue URL Rewriting gives ACT (and other HTTP drivers) is that the variable (Session ID or __VIEWSTATE) will change every time a given Page is retrieved by a given user, this means the URL Requests (which simulate the user clicking a link) cannot be hard coded or parameter driven via a fixed table, instead the variable has to be extracted from the Main Page then used to reconstruct the appropriate URL Request. This processing can be achieved via Regular Expressions within ACT.





Introduction to Regular Expressions

This section describes the overall concept and syntax of Regular Expressions, whilst the next section describes how VBscript handles Regular Expressions. Regular Expressions (within ACT VBscript) are text-processing functions which allow you to:-

1 Test for a pattern within a string.
For example, you can test whether a Web Page contains a given Credit Card number. This is called data validation.

2 Extract a substring from a string based upon a pattern match.
You can find specific text within a document or input field. We will use this function to extract out Session Ids to variables (which can later be used to build the required dynamic URL Requests).

A regular expression is a pattern of text that consists of ordinary characters (for example, letters a through z) and special characters, known as metacharacters. The pattern describes one or more strings to match when searching a body of text. The regular expression serves as a template for matching a character pattern to the string being searched.

Here are some examples of regular expression you might encounter:

VBScript
Matches
"^\s*$" Match a blank line.
"\d{2}-\d{5}" Validate an ID number consisting of 2 digits, a hyphen, and another 5 digits.


The following table contains the complete list of metacharacters and their behavior in the context of regular expressions:

This Table is taken directly from Microsoft VBScript MSDN and is repeated here for easy reference.

Character
Description
\ Marks the next character as either a special character, a literal, a back reference, or an octal escape. For example, 'n' matches the character "n". '\n' matches a newline character. The sequence '\\' matches "\" and "\(" matches "(".
^ Matches the position at the beginning of the input string. If the RegExp object's Multiline property is set, ^ also matches the position following '\n' or '\r'.
$ Matches the position at the end of the input string. If the RegExp object's Multiline property is set, $ also matches the position preceding '\n' or '\r'.
* Matches the preceding character or subexpression zero or more times. For example, zo* matches "z" and "zoo". * is equivalent to {0,}.
+ Matches the preceding character or subexpression one or more times. For example, 'zo+' matches "zo" and "zoo", but not "z". + is equivalent to {1,}.
? Matches the preceding character or subexpression zero or one time. For example, "do(es)?" matches the "do" in "do" or "does". ? is equivalent to {0,1}
{n} n is a nonnegative integer. Matches exactly n times. For example, 'o{2}' does not match the 'o' in "Bob," but matches the two o's in "food".
{n,} n is a nonnegative integer. Matches at least n times. For example, 'o{2,}' does not match the "o" in "Bob" and matches all the o's in "foooood". 'o{1,}' is equivalent to 'o+'. 'o{0,}' is equivalent to 'o*'.
{n,m} m and n are nonnegative integers, where n <= m. Matches at least n and at most m times. For example, "o{1,3}" matches the first three o's in "fooooood". 'o{0,1}' is equivalent to 'o?'. Note that you cannot put a space between the comma and the numbers.
? When this character immediately follows any of the other quantifiers (*, +, ?, {n}, {n,}, {n,m}), the matching pattern is non-greedy. A non-greedy pattern matches as little of the searched string as possible, whereas the default greedy pattern matches as much of the searched string as possible. For example, in the string "oooo", 'o+?' matches a single "o", while 'o+' matches all 'o's.
. Matches any single character except "\n". To match any character including the '\n', use a pattern such as '[\s\S]'.
(pattern) Matches pattern and captures the match. The captured match can be retrieved from the resulting Matches collection, using the SubMatches collection in VBScript or the $0…$9 properties in JScript. To match parentheses characters ( ), use '\(' or '\)'.
(?:pattern) Matches pattern but does not capture the match, that is, it is a non-capturing match that is not stored for possible later use. This is useful for combining parts of a pattern with the "or" character (|). For example, 'industr(?:y|ies) is a more economical expression than 'industry|industries'.
(?=pattern) Positive lookahead matches the search string at any point where a string matching pattern begins. This is a non-capturing match, that is, the match is not captured for possible later use. For example 'Windows (?=95|98|NT|2000)' matches "Windows" in "Windows 2000" but not "Windows" in "Windows 3.1". Lookaheads do not consume characters, that is, after a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the lookahead.
(?!pattern) Negative lookahead matches the search string at any point where a string not matching pattern begins. This is a non-capturing match, that is, the match is not captured for possible later use. For example 'Windows (?!95|98|NT|2000)' matches "Windows" in "Windows 3.1" but does not match "Windows" in "Windows 2000". Lookaheads do not consume characters, that is, after a match occurs, the search for the next match begins immediately following the last match, not after the characters that comprised the lookahead.
x|y Matches either x or y. For example, 'z|food' matches "z" or "food". '(z|f)ood' matches "zood" or "food".
[xyz] A character set. Matches any one of the enclosed characters. For example, '[abc]' matches the 'a' in "plain".
[^xyz] A negative character set. Matches any character not enclosed. For example, '[^abc]' matches the 'p' in "plain".
[a-z] A range of characters. Matches any character in the specified range. For example, '[a-z]' matches any lowercase alphabetic character in the range 'a' through 'z'.
[^a-z] A negative range characters. Matches any character not in the specified range. For example, '[^a-z]' matches any character not in the range 'a' through 'z'.
\b Matches a word boundary, that is, the position between a word and a space. For example, 'er\b' matches the 'er' in "never" but not the 'er' in "verb".
\B Matches a nonword boundary. 'er\B' matches the 'er' in "verb" but not the 'er' in "never".
\cx Matches the control character indicated by x. For example, \cM matches a Control-M or carriage return character. The value of x must be in the range of A-Z or a-z. If not, c is assumed to be a literal 'c' character.
\d Matches a digit character. Equivalent to [0-9].
\D Matches a nondigit character. Equivalent to [^0-9].
\f Matches a form-feed character. Equivalent to \x0c and \cL.
\n Matches a newline character. Equivalent to \x0a and \cJ.
\r Matches a carriage return character. Equivalent to \x0d and \cM.
\s Matches any whitespace character including space, tab, form-feed, etc. Equivalent to [ \f\n\r\t\v].
\S Matches any non-white space character. Equivalent to [^ \f\n\r\t\v].
\t Matches a tab character. Equivalent to \x09 and \cI.
\v Matches a vertical tab character. Equivalent to \x0b and \cK.
\w Matches any word character including underscore. Equivalent to '[A-Za-z0-9_]'.
\W Matches any nonword character. Equivalent to '[^A-Za-z0-9_]'.
\xn Matches n, where n is a hexadecimal escape value. Hexadecimal escape values must be exactly two digits long. For example, '\x41' matches "A". '\x041' is equivalent to '\x04' & "1". Allows ASCII codes to be used in regular expressions.
\num Matches num, where num is a positive integer. A reference back to captured matches. For example, '(.)\1' matches two consecutive identical characters.
\n Identifies either an octal escape value or a backreference. If \n is preceded by at least n captured subexpressions, n is a backreference. Otherwise, n is an octal escape value if n is an octal digit (0-7).
\nm Identifies either an octal escape value or a backreference. If \nm is preceded by at least nm captured subexpressions, nm is a backreference. If \nm is preceded by at least n captures, n is a backreference followed by literal m. If neither of the preceding conditions exists, \nm matches octal escape value nm when n and m are octal digits (0-7).
\nml Matches octal escape value nml when n is an octal digit (0-3) and m and l are octal digits (0-7).
\un Matches n, where n is a Unicode character expressed as four hexadecimal digits. For example, \u00A9 matches the copyright symbol (©).




Constructing a Regular Expression

A Regular Expression consists of an Expression and a Target, these together give a Result (also known as a Match). For example if I wanted to find dog (expression) in the sentence (target) Jack was the biggest dog of all the dogs in the park. I would apply the expression dog to the target sentence and I would return the Match dog as it exists. Note that I can also retrieve the count of the number of times the Match is found. The VBscript syntax for this will be discussed in the next section but the concept is that an expression is applied to a target to retrieve the result (Match) and a Count. The special characters noted above can also be used so that the . which matches any single character could go into the expression i.e. dog.. If dog. is used on the same target then dog and dogs are returned (Matched) and the count is two.

Note: If I wanted to find dog followed by a period . then I would have to escape the period (which is another way of stating, do not treat the next character as a special character). The backslash \ is the escape character, so if I wanted dog. I would use dog\. In which case nothing would be returned from our target sentence and the count would be zero, as there is no dog followed by a period in the target.

The expression can become complicated, using sub-expressions and logical operators such as AND, OR etc. The important point to note however, as a concept, is that I can check for and extract (as the returned Match) any text that matches a given pattern. In this article only those Regular Expressions useful to the objectives of ACT usage are explored in detail. If you wish to get a deeper understanding of Regular Expressions there are many references on the Web (Google search Regular Expression).

Regular Expression usage in VBscript

Having briefly looked at the concept of Regular Expressions, this section documents the specific usage of Regular Expressions in VBscript. In VBscript a Regular Expression Object implements the regular expression functions. VBscript Objects are outside the scope of this document but in essence they provide a storage place for data as well as a reference point to functions\routines that can operate on that data.

The Properties of the VBscript object are:-

* Pattern - A string that is used to define the regular expression.
This must be set before use of the regular expression object. Patterns are described in more detail below.

* IgnoreCase - A Boolean property that indicates if the match should ignore case (i.e. Case insensitive)
By default, IgnoreCase is set to False.

* Global - A Boolean property that indicates if the regular expression should be tested against all possible matches in a string.
By default, Global is set to False.

The Methods of the VBscript object are:-

* Test (string) - The Test method takes a string as its argument and returns True if the regular expression can successfully be matched against the string, otherwise False is returned.

* Replace (search-string, replace-string) - The Replace method takes 2 strings as its arguments. If it is able to successfully match the regular expression in the search-string, then it replaces that match with the replace-string, and the new string is returned. If no matches were found, then the original search-string is returned.

* Execute (search-string) - The Execute method works like Replace, except that it returns a Matches collection object, containing a Match object for each successful match. It doesn't modify the original string.

What is interesting, at least for using the VBscript RegEx object in ACT, is the Matches collection that is returned after the Execute method have been invoked. Using this Object we can parse out substrings and place them into variables for future use (Session Ids etc.) An example using the properties and Matches\SubMatches is now given.

An example VBscript Regular Expression, using the Matches object

If you copy and paste the following into Notepad and save it as anything.vbs then double click the file name you should see echoed the two extracted variables (I know people like to see things that work). Note: the use of the escape character \ in the pattern to pick out a real ( bracket. The line of data is from a TCPDUMP and this script will extract the two sequence numbers using submatches, which relate the partitions in the pattern. Notice that both the Matches and SubMatches begin at 0.

mydata = "14:10:48.019591 IP MyServer.2912 > MyClient.1494: P 1659928455:1659928474(19) ack 2614425189 win 16456 (DF)"

Dim oRegExp
Set oRegExp = New RegExp
oRegExp.Global = True
oRegExp.IgnoreCase = True
oRegExp.Pattern = "(: P |: R |: F |: FP )(\d.*)(:)(\d.*)(\(\d)(.*)"

Set colMatches = oRegExp.Execute(mydata)

if colMatches.Count > 0 Then
s1 = colMatches(0).SubMatches(1)
s3 = colMatches(0).SubMatches(3)
End If

wscript.echo(s1)
wscript.echo(s3)

Regular Expression (VBscript) usage in ACT

Using the concept of Matches and SubMatches we can extract and verify text and thereby implement solutions for URL Rewriting and Verification using Regular Expressions in Application Center Test (ACT).

Text Check example.


In order to check for the existence of a given piece of text, the following code is called, which includes a Regular Expression.

Function CheckBody(sTarget)

Dim oRegExp
Dim oMatches
Dim sBody
Dim bFoundTarget

bFoundTarget = False

sBody = g_oResponse.Body

Set oRegExp = New RegExp
oRegExp.Pattern = sTarget

' run the search
Set oMatches = oRegExp.Execute(sBody)

If oMatches.Count > 0 Then
bFoundTarget = True
Else
ActTrace L_ErrPageNotFound_Text & " " & sTarget
End If

CheckBody = bFoundTarget

End Function

Notes on CheckBody


This sample code is taken directly from the Duwamish 7.0 Sample in the ACTSamples (zip) that comes with ACT.
The string to be searched for is passed to this function in the sTarget variable. the g_oResponse.Body is a global variable which contains the HTTP response which is to be searched. For this simple expression only the number of matches returned by the Execute Method is needed, this is expected to be greater than zero. In the sample if the text is not found an error is logged to the trace but the test continues, in practice you may want to terminate the test or take other action. The above sub routine is invoked using the code segment:-

Set g_oResponse = g_oConnection.Send(g_oRequest)

If (g_oResponse Is Nothing) Then
ActTrace L_ErrRequest_Text
Else
If IsSuccessful(g_oResponse) Then
Call SetViewState(g_oResponse)
CheckBody(sPageString)
End If
End If

What this code is doing is retrieving a page on an Open connection then retrieving the _ViewState variable (this is explained next for the URL Rewriting example) then validating the correct page has been retrieved by calling the CheckBody Sub routine.
The Duwamish 7.0 Sample in the ACTSamples has many useful structures (extensive use of Sub routines and constants) and is only repeated here to reinforce the Regular expression usage.

URL Rewriting example.


The following example is also taken from the Duwamish 7.0 Sample in the ACTSamples, however, the code here has been changed to implement the same functionality using Regular Expressions. In the Duwamish 7.0 Sample the __VIEWSTATE is extracted using the InStr function which returns the Position of a given string, which can then be used to extract a string using the Mid function. Using Regular Expressions for this URL Rewriting offers greater flexibility over the InStr usage as the variable to be extracted is often delimited by other variables which can only be referenced using pattern matching. The original code from the Duwamish 7.0 Sample is commented out, to show the original implementation.

SetViewState example.


Sub SetViewState(g_oResponse)

Dim Pos, PosStart, PosEnd
Dim res, vState
Dim oRegExp, colMatches

Set oRegExp = New RegExp
oRegExp.Global = True oRegExp.IgnoreCase = True
oRegExp.Pattern = "(__VIEWSTATE\"" value=\"")(.*)(\"")"

If (g_oResponse Is Nothing) Then
ActTrace L_ErrRequest_Text
Else
Set colMatches = oRegExp.Execute(g_oResponse.Body)
If colMatches.Count > 0 Then
res = colMatches(0).SubMatches(1)

'COMMENT Pos = InStr(g_oResponse.Body, "__VIEWSTATE")
'COMMENT If Pos > 0 Then
'COMMENT PosStart = InStr(Pos, g_oResponse.Body, "value=""")
'COMMENT PosStart = PosStart + Len ("value=""")
'COMMENT PosEnd = InStr(PosStart, g_oResponse.Body, """")
'COMMENT res = Mid(g_oResponse.Body, PosStart, PosEnd - PosStart)

Test.SetGlobalVariable "vState",res
g_ViewState = Test.GetGlobalVariable("vState")
End If
End If

End Sub

Notes on SetViewState


The actual g_oResponse.Body contains the following line:-

input type="hidden" name="__VIEWSTATE" value="dDwtMTQ2NjQ3MjkwNjt0PHkD3PMCY"

In this example the SubMatches collection is used to get to the variable portion.

The g_ViewState is a global variable which is later used in the URL Request, which is constructed using:-

g_oRequest.Verb = "POST"
g_oRequest.Body = "__VIEWSTATE=" & g_ViewState & "&LogonEmailTextBox=" & oUser.Name & _
"&LogonPasswordTextBox=" & oUser.Password & "&LogonButton=Logon"

Set g_oResponse = g_oConnection.Send(g_oRequest)

Notice here the oUser.Name and oUser.Password variables are taken from the Users Table to give the variable parameters which is the subject of a different article. The important variable to note here is the g_ViewState which was previously parsed out (using Regular Expressions) from the last HTTP Response.

Using the flexibility of Regular Expressions ANY variable can be parsed out for URL Rewriting, although this example (the __VIEWSTATE) is a simple case.

Conclusions

Application Center Test provides a framework (Objects) that allows for a Windows Scripting Host (WSH) language, such as VBScript, to manipulate and control HTTP streams.

The construction of streams of data for HTTP request and response handling can be achieved using Regular Expressions. A good (if not expert) understanding of VBScript Regular Expressions is essential in order to fully utilize ACT.

This article only serves as a Primer on the subject and examination of the Samples shipped with ACT as well as other research into VBScript and Regular Expressions is strongly advised in order to master this subject.

No guarantee (or claim) is made regarding the accuracy of this information. Any questions or comments should be sent to:-