0

i am trying to pic the only url /~/ to .ashx wich is within the quots. from the complete html source file wich i have scraped , i tried the below function to get href match list .

processHTML <- function(html) {
  doc <- htmlTreeParse(html, useInternalNodes=TRUE)
  text <- xpathSApply(doc, "//a/@href")
}

from the below code snippet i need to pic only excluding the href and qoutations , /~/media/McKinsey/Business Functions/Marketing and Sales/Our Insights/Discussions in digital Whats a marketing ecosystem/Discussions-in-digital-Marketings-ecosystem.ashx:

href   "/~/media/McKinsey/Business Functions/Marketing and Sales/Our Insights/Discussions in digital Whats a marketing ecosystem/Discussions-in-digital-Marketings-ecosystem.ashx"

please help me out with regular expression for above problem

Wiktor Stribiżew
  • 561,645
  • 34
  • 376
  • 476
vivek goud
  • 11
  • 4

1 Answers1

1

If I understood the question properly then this might help

txt[grepl('.ashx', txt)][['href']]

Output is:

[1] "/~/media/McKinsey/Business Functions/Marketing and Sales/Our Insights/Discussions in digital Whats a marketing ecosystem/Discussions-in-digital-Marketings-ecosystem.ashx"

Sample data:

txt <- structure(c("mailto:?subject=From%20mckinsey.com%3a%20Discussions%20in%20digital%3a%20What%e2%80%99s%20a%20marketing%20ecosystem%20and%20what%20does%20it%20mean%20for%20marketers%3f&body=I%20recommend%20you%20visit%20mckinsey.com%20to%20read%3a%0d%0a%0d%0aDiscussions%20in%20digital%3a%20What%e2%80%99s%20a%20marketing%20ecosystem%20and%20what%20does%20it%20mean%20for%20marketers%3f%0d%0ahttp%3a%2f%2fwww.mckinsey.com%2fbusiness-functions%2fmarketing-and-sales%2four-insights%2fdiscussions-in-digital-whats-a-marketing-ecosystem%3fcid%3deml-web", 
"/~/media/McKinsey/Business Functions/Marketing and Sales/Our Insights/Discussions in digital Whats a marketing ecosystem/Discussions-in-digital-Marketings-ecosystem.ashx"
), .Names = c("href", "href"))
1.618
  • 11,271
  • 1
  • 15
  • 32
  • thanks for help but i need to pick the complete url from /~/ till the end .ashx "/~/media/McKinsey/Business Functions/Marketing and Sales/Our Insights/Discussions in digital Whats a marketing ecosystem/Discussions-in-digital-Marketings-ecosystem.ashx" – vivek goud Mar 27 '18 at 06:29
  • Isn't the o/p shown above exactly same as what you expect? I would suggest to update your post with the exact i/p (using `dput`) and desired o/p otherwise I am afraid this question will be closed soon ([this](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) link might help). – 1.618 Mar 27 '18 at 06:33
  • 1
    it worked . Thankyou so much ...! – vivek goud Mar 27 '18 at 06:44
  • Glad that it helped! Pls edit your question to add sample data (you may copy from my answer) and I'll vote to reopen your post. Once this is done then you may [accept the answer](https://stackoverflow.com/help/someone-answers) so that the question can be closed properly. – 1.618 Mar 27 '18 at 07:25