Home‎ > ‎Applied Core Java‎ > ‎

Expressions

Regular Expressions Constructs

A regular expression is a pattern of characters that describes a set of strings. You can use the java.util.regex package to find, display, or modify some or all of the occurrences of a pattern in an input sequence.

The simplest form of a regular expression is a literal string, such as "Java" or "programming" Regular expression matching also allows you to test whether a string fits into a specific syntactic form, such as an email address.

To develop regular expressions, ordinary and special characters are used:

 \$

^

.

*

+

?

['

']

\.

 

 

 

Any other character appearing in a regular expression is ordinary, unless a \ precedes it.

Special characters serve a special purpose. For instance, the . (dot) matches anything except a new line. A regular expression like s.n matches any three-character string that begins with s and ends with n, including SunilOS and son.

There are many special characters used in regular expressions to find words at the beginning of lines, words that ignore case or are case-specific, and special characters that give a range, such as a-e, meaning any letter from a to e.

Regular expression usage using this new package is Perl-like, so if you are familiar with using regular expressions in Perl, you can use the same expression syntax in the Java programming language. If you're not familiar with regular expressions here are a few to get you started:

Construct

Matches

Characters

 

x

The character x

\\

The backslash character

\0n

The character with octal value 0n (0 <= n <= 7)

\0nn

The character with octal value 0nn (0 <= n <= 7)

\0mnn

The character with octal value 0mnn (0 <= m <= 3, 0 <= n <= 7)

\xhh

The character with hexadecimal value 0xhh

\uhhhh

The character with hexadecimal value 0xhhhh

\t

The tab character ('\u0009')

\n

The newline (line feed) character ('\u000A')

\r

The carriage-return character ('\u000D')

\f

The form-feed character ('\u000C')

\a

The alert (bell) character ('\u0007')

\e

The escape character ('\u001B')

\cx

The control character corresponding to x

 

 

Character Classes

[abc]

a, b, or c (simple class)

[^abc]

Any character except a, b, or c (negation)

[a-zA-Z]

a through z or A through Z, inclusive (range)

[a-z-[bc]]

a through z, except for b and c: [ad-z] (subtraction)

[a-z-[m-p]]

a through z, except for m through p: [a-lq-z]

[a-z-[^def]]

d, e, or f

 

 

Predefined Character Classes

.

Any character (may or may not match line terminators)

\d

A digit: [0-9]

\D

A non-digit: [^0-9]

\s

A whitespace character: [ \t\n\x0B\f\r]

\S

A non-whitespace character: [^\s]

\w

A word character: [a-zA-Z_0-9]

\W

A non-word character: [^\w]

Comments