We have seen in an earlier example for matchers that we can find the start or end of a reg expression using .start or .end.
If we are looking into finding the multiple occurrences of a pattern there is an easy way to use it using group pattern .Please note that it wil start looking for reg expression only in the paranthesis ()
So to Print everything between
// String h2GroupPattern = "(<h2.*</h2>)";
The issue with above expression is that it uses greedy quantifier .Greedy quantifiers won't stop at the first occurence of but will keep looking for other and will only stop once the last tag is found
So it will find everything in between the first
The solution to this is a lazy quantifier. it will look for the first
Also note that if we were not interested in printing empty h 2 tags i.e the ones that don't contain anything between h 2 tags we have to use + instead of * i.e .+ as + will only print one or more occurrences and * prints 0 or more occurrences
String h 2GroupPattern = "(<h 2.*?</h 2>)";
If we use this Group pattern it will print all the h 2 tags in our String
String h 2GroupPattern = "(<h 2>)";
So in a nutshell what these group patterns are doing is just finding the occurences of the expessions we pass to the pattetn and group them
"(
and
"(<h 2.*?
So lets see the code below
/*Create a instance of Pattern class*/
Pattern groupPattern = Pattern.compile(h 2GroupPattern);
Matcher groupMatcher = groupPattern.matcher(htmlText);
System.out.println(groupMatcher.matches());
groupMatcher.reset();
while (groupMatcher.find()) {
/*group 0 contains the entire String and group 1 means the the group that contains h 2 tags
System.out.println("Occurrence: " + groupMatcher.group(1));
}
If we just want to print the text within h 2 tags and not the tags itself we use the below i.e we define 3 groups <h 2> ,the text after h 2 tag and </h 2>
String h2TextGroup = "()(.*?)( )";
Another intersting point is if we remove braces from around h 2 and /h 2 that means we are looking for a pattern that has h 2 and then a group .*? and
then /h 2 in the end do we are just looking for one group .*?
So for that we just need group no 1 with group 0 the entire string
String h2TextGroup = "(.*?) ";
h 2TextMatcher.group(1))
Complete code
Pattern h2TextPattern = Pattern.compile(h2TextGroup);
Matcher h2TextMatcher = h2TextPattern.matcher(htmlText);
while(h2TextMatcher.find()){
/Here we define group 2 i.e not group 1 i.e and not group 3 i.e /
System.out.println("Occurrence:----------> " +h2TextMatcher.group(2));
}