Regular Expressions in Java
Now they're faster!
I have written a package named "pat" to do regular expressions
in java. It supports most of the perl5 syntax, and is documented
in pages generated by javadoc. It works by treating each pattern
element as a
class which knows how to match itself and ask the next element to
match itself. Because of this, you can extend class regex to match
new syntax and pattern types.
Example of use
import pat.Regex;
public class tstRegex {
public static void main(String[] notused) {
Regex r = new Regex("[a-c]+([x-z]+)");
r.search("abcxyz");
System.out.println("match => "+r.substring());
System.out.println("backrefernce 0 => "+r.substring(0));
}
}
Which produces the output:
match => abcxyz
backreference 0 => xyz
You can get a good idea of how to use this package by seeing my
quick start guide.
To install this software simply download this file
pat10.zip.
Documetation can be found in the directory pat/doc if you install,
or it may be read online. The best place to start is
pat.Regex.html. It has everything
you need to get started.
Some source files for examples can be found in:
- deriv.java which is an example
of how to derive your own pattern class from my base class Pattern.
- guigrep.java which is a java
program for searching files in the current directory. See
guigrep.html for more info.
- tokenTest.java which is just
a program to test the RegexTokenizer.
If you wish to track new developments with this software,
or if you would like to give me a suggestion, please send
me email. I am interested in
hearing about whatever would make you more likely to use this package,
and am interested in hearing about whether you currently use it.
But, if you don't want to go to the trouble of downloading it, if you
would rather just type in perverse patterns to try and break my library,
then you can just do that below. Simply type a pattern in, then some
text, then hit the return key to see the results of the match.
Or, if you are a really perverse individual, you could play my new
regular expression game.
Differences from beta version
This is now release 1.0.
- The compile method of Regex now throws a
RegSyntax exception, and the constructor Regex(String) never
does. Under the previous version of the JDK it was possible to throw
in a way that did not require the user to catch it and both the
constructor and method compile used it.
- There is now an optimize() method. Once a pattern
is optimized, you should not change the contents of the ignoreCase or
dontMatchInQuotes variables. There is also an optimized() method
to determine if the optimze() method has been called.
- I am now asking a very small fee for using the package for
software development (after a customary trial period). I hope this is not
a problem for anyone, and that it will enable my family to go out for
burgers once in a while :-)
- A few bug fixes.
Differences from the alpha version
The new version differs in the following ways:
- It contains a few bug fixes (though I have received very few bug reports)
- My naming convention has changed to
accomodate capitols for classes and lower case for methods. The class
name "regex" is now "Regex."
- There is a new class RegRes, from which Regex is now derived.
RegRes is short for "Regular expression match result" and a RegRes object containing info about the last successful match can be obtained from
Regex's result() method.
- Backreferences (things in ()'s) are now treated more like patterns.
In other words, Regex.left() returns what's left of the match,
and Regex.left(1) returns what's left of backreference 1.
The only difference is that a function that takes void refers
to the match, and a function that takes int refers to a backreference. This convention applies to: left() right() substring() matchFrom() charsMatched().
- Return values make more sense, an unmatched pattern will give null from left(), and -1 from charsMatched().