Group patterns in java

We have seen in an earlier example for matchers that we can find the start or end of a reg expression using .start or .end.

If we are looking into finding the multiple occurrences of a pattern there is an easy way to use it using group pattern .Please note that it wil start looking for reg expression only in the paranthesis ()

So to Print everything between

and */

 //  String h2GroupPattern = "(<h2.*</h2>)";

The issue with above expression is that it uses greedy quantifier .Greedy quantifiers won't stop at the first occurence of but will keep looking for other and will only stop once the last tag is found

So it will find everything in between the first

and the very last

The solution to this is a lazy quantifier. it will look for the first

and stop once the first is found To convert a greedy quantifier to lazy quantifier all we have to do is add a ? after the

Also note that if we were not interested in printing empty h 2 tags i.e the ones that don't contain anything between h 2 tags we have to use + instead of * i.e .+ as + will only print one or more occurrences and * prints 0 or more occurrences

    String h 2GroupPattern = "(<h 2.*?</h 2>)";

If we use this Group pattern it will print all the h 2 tags in our String

    String h 2GroupPattern = "(<h 2>)";

So in a nutshell what these group patterns are doing is just finding the occurences of the expessions we pass to the pattetn and group them

"(

)" finds all occurences of h 2 in the String

and

"(<h 2.*?

)"; finds all occurences of everything within <h 2 and tags

So lets see the code below

    /*Create a instance of Pattern class*/
Pattern groupPattern = Pattern.compile(h 2GroupPattern);
Matcher groupMatcher = groupPattern.matcher(htmlText);
System.out.println(groupMatcher.matches());
groupMatcher.reset();
while (groupMatcher.find()) {
/*group 0  contains the entire String and group 1  means the the group that contains h 2 tags
        System.out.println("Occurrence: " + groupMatcher.group(1));
}
If we just want to print the text within h 2 tags and not the tags itself we use the below i.e we define 3 groups <h 2> ,the text after h 2 tag and </h 2>
 String h2TextGroup = "(

)(.*?)()";
Another intersting point is if we remove braces from around h 2 and /h 2 that means we are looking for a pattern that has h 2 and then a group .*? and then /h 2 in the end do we are just looking for one group .*? So for that we just need group no 1 with group 0 the entire string
String h2TextGroup = "

(.*?)"; h 2TextMatcher.group(1))

Complete code

Pattern h2TextPattern = Pattern.compile(h2TextGroup);
Matcher h2TextMatcher = h2TextPattern.matcher(htmlText);
while(h2TextMatcher.find()){
 /Here we define group 2 i.e not group 1 i.e 

and not group 3 i.e / System.out.println("Occurrence:----------> " +h2TextMatcher.group(2)); }