Jumat, 29 November 2024

Alternation and Subexpressions (Scripting)

 

Alternation and Subexpressions (Scripting)

Windows Scripting 5.8

Updated: January 2010

Alternation in a regular expression enables you to group choices between two or more alternatives. You can essentially specify "this OR that" in a pattern.

Subexpressions enable you to match a pattern in searched text and divide the match into separate submatches. The resulting submatches can be retrieved by the program. Subexpressions also enable you to reformat text, as described in Backreferences (Scripting).

For more information about regular expressions, see Creating a Regular Expression (Scripting) and Regular Expression Syntax (Scripting).

You can use the pipe (|) character to specify a choice between two or more alternatives. This is known as alternation. The largest possible expression on either side of the pipe character is matched. You might think that the following expressions for JScript and Visual Basic Scripting Edition (VBScript) match either "Chapter" or "Section" followed by one or two digits.

/Chapter|Section [1-9][0-9]{0,1}/
"Chapter|Section [1-9][0-9]{0,1}"

Instead, the regular expressions match either the word "Chapter" or the word "Section" and whatever numbers follow that. If the searched string is "Section 22", the expressions match "Section 22". However, if the searched string is "Chapter 22", the expressions match the word "Chapter" instead of matching "Chapter 22".

Alternation with Parentheses

You can use parentheses to limit the scope of the alternation, that is, to make sure that it applies only to the two words, "Chapter" and "Section". By adding parentheses, you can make the regular expression match either "Chapter 1" or "Section 3".

Parentheses, however, are also used to create a subexpression. The resulting submatch can be retrieved by the program.

The following JScript regular expression uses parentheses to group "Chapter" and "Section". Possible matches will then include "Chapter" followed by a number.

/(Chapter|Section) [1-9][0-9]{0,1}/

The parentheses around "Chapter|Section" also cause either of the two matching words to be saved for future use.

The following example shows how the matches and submatches can be retrieved in code. Because there is only one set of parentheses in the expression, there is only one saved submatch.

var re = /(Chapter|Section) [1-9][0-9]{0,1}/g
var src = "Chapter 50  Section 85"
ShowMatches(src, re);

// Output:
//  Chapter 50
//  submatch 1: Chapter

//  Section 85
//  submatch 1: Section

// Perform a search on a string by using a regular expression,
// and display the matches and submatches.
function ShowMatches(src, re)
{
    var newLine = "<br />";
    var result;
    var s = "";

    // Get the first match.
    result = re.exec(src);
    
    while (result != null)
    {
        // Show the entire match.
        s += newLine + result[0] + newLine;

        // Show the submatches.
        // You can also obtain the submatches from RegExp.$1,
        // RegExp.$2, and so on.
        for (var index=1; index<result.length; index++)
            {
                s += "submatch " + index + ": ";
                s += result[index];
                s += newLine;
            }

        // Get the next match.
        result = re.exec(src);
    }
    document.write(s);
}
Dim src, ptrn
ptrn = "(Chapter|Section) [1-9][0-9]{0,1}"
src = "Chapter 50  Section 85"
ShowMatches src, ptrn
'
' Output:
'  Chapter 50
'  submatch 0: Chapter

'  Section 85
'  submatch 0: Section

' Perform a search on a string by using a regular expression,
' and display the matches and submatches.
Sub ShowMatches(src, ptrn)
    Dim re, Match, Matches, NewLine, Index, s
    NewLine = "<br />"

    ' Create the regular expression.
    Set re = New RegExp
    re.Pattern = ptrn
    re.Global = True
    re.IgnoreCase = True
    
    ' Get the Matches collection.
    Set Matches = re.Execute(src)
    
    s = ""
    For Each Match in Matches
        ' Show the entire match.
        s = s & NewLine & Match.Value & NewLine

        ' Show the submatches.
        For Index = 0 to Match.SubMatches.Count - 1
            s = s & "submatch " & Index & ": "
            s = s & Match.Submatches(Index)
            s = s & NewLine
        Next
    Next
    
    document.write(s)
End Sub

Alternation Without a Saved Submatch

In the previous example, you just want to use the parentheses to group a choice between the words "Chapter" and "Section".

To prevent the submatch from being saved for later use, you can specify the subexpression (?:pattern). The following example does the same thing as the previous example, but it does not save the submatch.

var re = /(?:Chapter|Section) [1-9][0-9]{0,1}/g
var src = "Chapter 50  Section 85"
ShowMatches(src, re);
// Output:
//  Chapter 50
//  Section 85
Dim src, ptrn
ptrn = "(?:Chapter|Section) [1-9][0-9]{0,1}"
src = "Chapter 50  Section 85"
ShowMatches src, ptrn
' Output
'  Chapter 50
'  Section 85

Placing parentheses in a regular expression creates a subexpression. The resulting submatch can be retrieved by the program.

In the following example, the regular expression contains three subexpressions. The submatch strings display together with each match.

var re = /(\w+)@(\w+)\.(\w+)/g
var src = "Please send mail to george@contoso.com and someone@example.com. Thanks!"
ShowMatches(src, re);
// The ShowMatches function is provided earlier.

// Output:
//  george@contoso.com
//  submatch 1: george
//  submatch 2: contoso
//  submatch 3: com

//  someone@example.com
//  submatch 1: someone
//  submatch 2: example
//  submatch 3: com
Dim src, ptrn
ptrn = "(\w+)@(\w+)\.(\w+)"
src = "Please send mail to george@contoso.com and someone@example.com. Thanks!"
ShowMatches src, ptrn
' The ShowMatches subroutine is provided earlier.

' Output:
'  george@contoso.com
'  submatch 0: george
'  submatch 1: contoso
'  submatch 2: com

'  someone@example.com
'  submatch 0: someone
'  submatch 1: example
'  submatch 2: com

The following example separates a Universal Resource Indicator (URI) into its component parts.

The first parenthetical subexpression saves the protocol part of the Web address. It matches any word that comes before a colon and two forward slashes. The second parenthetical subexpression saves the domain address part of the address. It matches any sequence of characters that does not include slash mark (/) or colon (:) characters. The third parenthetical subexpression saves a Web site port number, if one is specified. It matches zero or more digits following a colon. The fourth parenthetical subexpression saves the path and/or page information specified by the Web address. It matches zero or more characters other than the number sign character (#) or the space character.

var re = /(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)/gi;
var src = "http://msdn.microsoft.com:80/scripting/default.htm";
ShowMatches(src, re);

// Output:
//  http://msdn.microsoft.com:80/scripting/default.htm
//  submatch 1: http
//  submatch 2: msdn.microsoft.com
//  submatch 3: :80
//  submatch 4: /scripting/default.htm
Dim src, ptrn
ptrn = "(\w+):\/\/([^/:]+)(:\d*)?([^# ]*)"
src = "http://msdn.microsoft.com:80/scripting/default.htm"
ShowMatches src, ptrn

' Output:
'  http://msdn.microsoft.com:80/scripting/default.htm
'  submatch 0: http
'  submatch 1: msdn.microsoft.com
'  submatch 2: :80
'  submatch 3: /scripting/default.htm

A positive lookahead is a search in which, after a match is found, the search for the next match starts before the matched text. The match is not saved for later use. To specify a positive lookahead, use the syntax (?=pattern).

In the following example, a search is performed to determine whether a password is 4 to 8 characters long and contains at least one digit.

In the regular expression, .*\d finds any number of characters followed by a digit. For the searched string "abc3qr", this matches "abc3". Starting before instead of after that match, .{4,8} matches a 4 to 8 character string. This matches "abc3qr".

The ^ and $ specify the positions at the start and end of the searched string. This is to prevent a match if the searched string contains any characters outside of the matched characters.

var re = /^(?=.*\d).{4,8}$/gi
var src = "abc3qr"
ShowMatches(src, re);
// The ShowMatches function is provided earlier.
// Output:
//  abc3qr
Dim src, ptrn
ptrn = "^(?=.*\d).{4,8}$"
src = "abc3qr"
ShowMatches src, ptrn
' The ShowMatches function is provided earlier.
' Output:
'  abc3qr

A negative lookahead searches for a search string that does not match the pattern in a negative lookahead expression. After a match is found, the search for the next match starts before the matched text. The match is not saved for later use. To specify a negative lookahead, use the syntax (?!pattern).

The following example matches words that do not start with "th".

In the regular expression, \b matches a word boundary. For the searched string " quick ", this matches the first space. (?!th) matches a string that is not "th". This matches "qu". Starting before that match, \w+ matches a word. This matches "quick".

var re = /\b(?!th)\w+\b/gi
var src = "The quick brown fox jumps over the lazy dog."
ShowMatches(src, re);
// Output:
//  quick
//  brown
//  fox
//  jumps
//  over
//  lazy
//  dog
Dim src, ptrn
ptrn = "\b(?!th)\w+\b"
src = "The quick brown fox jumps over the lazy dog."
ShowMatches src, ptrn

' Output:
'  quick
'  brown
'  fox
'  jumps
'  over
'  lazy
'  dog

Tidak ada komentar: