Ruby -- trying to grab this here even if on multiple lines

Question

Currently, I am grabbing titles using the following method:

title = html_response[/<title[^>]*>(.*?)<\/title>/,1]

This does a great job at catching "This is a title" from <title>This is a title</title>. However, there are some web pages that open the title tag on one line, print the title on the next line, and then close the title tag.

The Ruby line I presented above doesn't catch titles such as those, so I'm just trying to find a fix for that.

score 4 · Answer 1 · edited May 23 '17 at 11:43

4

This famous stackoverflow post explains why it's a bad idea to use regular expressions to parse HTML. A better approach is to use a gem like Nokogiri to parse out the title tags.

edited May 23 '17 at 11:43

Community

1
1

answered Mar 21 '14 at 15:21

Mori

26,205
10
63
70

Thanks! Much appreciated. – LewlSauce Mar 21 '14 at 15:24

score 1 · Accepted Answer · answered Mar 21 '14 at 15:21

1

Obligatory don't use regex with HTML sentence.

title = html_response[/<title[^>]*>(.*?)<\/title>/m,1]

The m enables multiline mode.

answered Mar 21 '14 at 15:21

cfeduke

22,750
10
60
65

Ruby -- trying to grab this here even if on multiple lines

2 Answers2