0

i have a string:

Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 <b>|</b> . . . 4 . .</tt></p>

It is all in a single line, so how would I extract only the information about the balls, ie output should be . . . . . . 3 . . 1b 4 . 1 1 1 . . 4 . . . 4 . .

The closest i got was with [^(Recent overs|<b>|<tt>|</b>|</tt>|</p>)]+, but it matches the 1 and not 1b.

Nightfirecat
  • 11,114
  • 6
  • 33
  • 50
ravi
  • 3
  • 1
  • Balls? What balls? What does that have to do with your question? – Justin Morgan Aug 02 '11 at 19:26
  • 1
    What Regex engine or language do you use? Also, within character class the alternation have no meaning... – nEAnnam Aug 02 '11 at 19:28
  • Sample in ruby: `x = 'Recent overs . . . . . . | 3 . . 1b 4 .| 1 1 1 . . 4 | . . . 4 . .'; result = x.gsub(/]+>/, '').gsub('|', '').match(/\..*\./)[0]` – taro Aug 02 '11 at 19:33

3 Answers3

0

Try \s[\d\.][\w]* to match all digit (possibly followed by word) characters or points preceeded by a space!

Vlad
  • 10,154
  • 2
  • 33
  • 38
  • The first group of `.`'s doesn't have a space before it. – Justin Morgan Aug 02 '11 at 19:35
  • Also, this will match `overs` in `Recent overs`. – Justin Morgan Aug 02 '11 at 19:36
  • But in either case, each data point will be in a separate match; i.e. each `.` will be a single match, each `1` will be a single match...each match will consist of one character, except for `1b`. He seems to want them grouped together according to which tag pair they're in. – Justin Morgan Aug 02 '11 at 19:42
  • @Justin: According to the regex he provided he can live with separate matches. – Vlad Aug 02 '11 at 19:45
0

Based solely on the example you gave, you could try something like:

/(?<>)[a-z\d\s\.]+/g

Alternative, in case your regex engine doesn't support lookbehinds:

/>([a-z\d\s\.]+)/g     #Matches will be in the first capture group.

However, it's a little hard to infer the rules of what should/should not be allowed based on the small sample you gave, and your output sample doesn't make much sense to me as a data structure. It seems like you might be better off using an HTML parser for this, since using regex to process HTML is frequently a bad idea.

Community
  • 1
  • 1
Justin Morgan
  • 28,775
  • 12
  • 76
  • 104
0

First, the brackets [] are used for creating what is called a "character class" - this is meant to represent a single character. Your code effectively says don't match these characters: (Recntovrsbp|<>/

You'd be better off using a regex to remove the unwanted strings, then it's easier to parse the result, like this:

Javascript, because you didn't specify the language

var s = "Recent overs</b> <tt>. . . . . . <b>|</b> 3 . . 1b 4 .<b>|</b> 1 1 1 . . 4 <b>|</b> . . . 4 . .</tt></p>";
s = s.replace(/(Recent overs|<[^>]+>|\|)/ig, '');

jsfiddle example

The resulting 's' is much easier to parse.

OverZealous
  • 38,722
  • 15
  • 96
  • 100