-1

I want to match these strings with regex and take all the data between <div> data </div> I have tried everything but still can't do it...

This is my regex 101: https://regex101.com/r/w7et0M/2

Here is the code:

<div> Sentinel. Winged Guardian cannot have restricted attachments.
Forced: After an attack in which Winged Guardian defends resolves, pay 1 Tactics resource or discard Winged Guardian from play.
</div>

<div>Attach to a hero.
Attached hero gains +1 Attack.
Action: Pay 1 resource from attached hero&#39;s pool to attach Dunedain Mark to another hero.
</div>

<div>Action: Search the top 5 cards of your deck for any number of Eagle cards and add them to your hand. Shuffle the other cards back into your deck.
</div>

<div>Attach to a hero.
Attached hero gains +1 Attack.
Action: Pay 1 resource from attached hero&#39;s pool to attach Dunedain Mark to another hero.
</div>

<div>Guarded.
Response: After the players quest successfully, the players may claim Signs of Gollum if it has no attached encounters. When claimed, attach Signs of Gollum to any hero committed to the quest. (Counts as a Condition attachment with: &#39;Forced: After attached hero is damaged or leaves play, return this card to the top of the encounter deck.&#39;)
</div>

Can anyone help me find the solution?

j3ff
  • 5,139
  • 7
  • 39
  • 50
ThunderBoy
  • 280
  • 3
  • 15
  • https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags – Yuriy Faktorovich Apr 11 '20 at 00:37
  • 1
    _"can anyone help me"_ Sure, [do not use regex for this](https://stackoverflow.com/a/1732454/8967612). Did you try using an HTML parser? [More about why regex isn't the right tool to parse HTML](https://stackoverflow.com/q/6751105/8967612). – 41686d6564 stands w. Palestine Apr 11 '20 at 00:37
  • Well Ahmed here is the issue.. This is the link `http://hallofbeorn.com/LotR?CardSet=The+Hunt+for+Gollum` that the above code with the `div` is written. I have also tried to do it with simple.html.dom but still nothing.. the code is for that reason .. hard to be crawled i think – ThunderBoy Apr 11 '20 at 00:40
  • You may extract the text that matches the regular expression `(?s)(?<=
    ).*?(?=)`. [Demo](https://regex101.com/r/qo2mhr/1/) `(?s)` specifies *single line mode*, which causes `.` to match newlines. Not specifying single line mode is one of the problems with your regex. `(?<=
    )` is a *positive lookbehind*; `(?=)` is a *positive lookahead*. There's no need for a capture group.
    – Cary Swoveland Apr 11 '20 at 01:24

1 Answers1

-1
<div>((.|\n)*?)<\/div>
Vivek Roy
  • 193
  • 1
  • 7
  • Vivek thank you for your time it's correct. It ls also brings some other data but it is managable with the arrays THANK YOU – ThunderBoy Apr 11 '20 at 00:45