Have one to sell? Sell yours here
Regular Expression Pocket Reference
 
 
Tell the Publisher!
I’d like to read this book on Kindle

Don't have a Kindle? Get your Kindle here, or download a FREE Kindle Reading App.

Regular Expression Pocket Reference [Paperback]

Tony Stubblebine
4.1 out of 5 stars  See all reviews (11 customer reviews)

Available from these sellers.


‹  Return to Product Overview

Product Description

Book Description

Regular Expressions for Perl, Ruby, PHP, Python, C, Java, and .NET --This text refers to an alternate Paperback edition.

Product Description

Regular expressions are such a powerful tool for manipulating text and data that anyone who uses a computer can benefit from them. Composed of a mixture of symbols and text, regular expressions can be an outlet for creativity, for brilliant programming, and for the elegant solution. While a command of regular expressions is an invaluable skill, all there is to know about them fills a very large volume, and you don't always have time to thumb through hundreds of pages each time a question arises. The answer is the Regular Expression Pocket Reference. Concise and easy-to-use, this little book is the portable companion to Mastering Regular Expressions.

This handy guide offers programmers a complete overview of the syntax and semantics of regular expressions that are at the heart of every text-processing application. Ideal as an introduction for beginners and a quick reference for advanced programmers, Regular Expression Pocket Reference is a comprehensive guide to regular expression APIs for C, Perl, PHP,Java, .NET, Python, vi, and the POSIX regular expression libraries.

O'Reilly's Pocket References have become a favorite among programmers everywhere. By providing a wealth of important details in a concise, well-organized format, these handy books deliver just what you need to complete the task at hand. When you've reached a sticking point and need to get to a solution quickly, the new Regular Expression Pocket Reference is the book you'll want to have.

From the Publisher

Ideal as an introduction for beginners and a quick reference for advanced programmers, Regular Expression Pocket Reference is a comprehensive guide to regular expression APIs for C, Perl, PHP, Java, .NET, Python, vi, and the POSIX regular expression libraries. This handy book offers programmers a complete overview of the syntax and semantics of regular expressions, which are at the heart of every text-processing application. When you've reached a sticking point and need to get to a solution quickly, the new Regular Expression Pocket Reference is the book you'll want to have.

About the Author

Tony Stubblebine, writes Perl and regular expressions for O'Reilly. Previously he held the titles of Web Peasant and Rogue Developer while hacking Perl for MasterCard International. He is also the Social Director and Senior Nightlife Correspondent for the O'Reilly Network.

Excerpted from Regular Expression Pocket Reference by Tony Stubblevine. Copyright © 2003. Reprinted by permission. All rights reserved.

Introduction to Regexes and Pattern Matching

A regular expression is a string containing a combination of normal characters and special metacharacters or metasequences. The normal characters match themselves. Metacharacters and metasequences are characters or sequences of characters that represent ideas such as quantity, locations, or types of characters. The list in the section "Regex Metacharacters, Modes, and Constructs" shows the most common metacharacters and metasequences in the regular expression world. Later sections list the availability of and syntax for supported metacharacters for particular implementations of regular expressions.

Pattern matching consists of finding a section of text that is described (matched) by a regular expression. The underlying code that searches the text is the regular expression engine. You can guess the results of most matches by keeping two rules in mind:

1. The earliest (leftmost) match wins
Regular expressions are applied to the input starting at the first character and proceeding toward the last. As soon as the regular expression engine finds a match, it returns. (See MRE 148-149, 177–179.)

2. Standard quantifiers are greedy
Quantifiers specify how many times something can be repeated. The standard quantifiers attempt to match as many times as possible. They settle for less than the maximum only if this is necessary for the success of the match. The process of giving up characters and trying less-greedy matches is called backtracking. (See MRE 151–153.)

Regular expression engines have subtle differences based on their type. There are two classes of engines: Deterministic Finite Automaton (DFA) and Nondeterministic Finite Automaton (NFA). DFAs are faster but lack many of the features of an NFA, such as capturing, lookaround, and non-greedy quantifiers. In the NFA world there are two types: Traditional and POSIX.

DFA engines
DFAs compare each character of the input string to the regular expression, keeping track of all matches in progress. Since each character is examined at most once, the DFA engine is the fastest. One additional rule to remember with DFAs is that the alternation metasequence is greedy. When more than one option in an alternation (foo|foobar) matches, the longest one is selected. So, rule #1 can be amended to read "the longest leftmost match wins." (See MRE 155–156.)

Traditional NFA engines
Traditional NFA engines compare each element of the regex to the input string, keeping track of positions where it chose between two options in the regex. If an option fails, the engine backtracks to the most recently saved position. For standard quantifiers, the engine chooses the greedy option of matching more text; however, if that option leads to the failure of the match, the engine returns to a saved position and tries a less greedy path. The traditional NFA engine uses ordered alternation, where each option in the alternation is tried sequentially. A longer match may be ignored if an earlier option leads to a successful match. So, rule #1 can be amended to read "the first leftmost match after greedy quantifiers have had their fill." (See MRE 153–154.)

POSIX NFA engines
POSIX NFA Engines work similarly to Traditional NFAs with one exception: a POSIX engine always picks the longest of the leftmost matches. For example, the alternation cat|category would match the full word "category" whenever possible, even if the first alternative ("cat") matched and appeared earlier in the alternation. (See MRE 153–154.)

‹  Return to Product Overview