1

I would like to select the following text:

Bold normal Italics

I need to select and get: Bold normal italist.

The html is:

<a href=""><strong>Bold</strong> normal <i>Italist</i></a>

However, a/text() yields

normal

only. Does anyone know a fix? I'm testing bing crawling, and the bold text is in different position depending on the query.

GRS
  • 2,625
  • 3
  • 32
  • 64
  • 1
    You need to understand [**the difference between text nodes and string values in XPath**](https://stackoverflow.com/a/41077106/290085) – kjhughes Jun 02 '17 at 16:20

2 Answers2

3

You can use a//text() instead of a/text() to get all text items.

# -*- coding: utf-8 -*-
from scrapy.selector import Selector

doc = """
<a href=""><strong>Bold</strong> normal <i>Italist</i></a>
"""

sel = Selector(text=doc, type="html")

result = sel.xpath('//a/text()').extract()
print result
# >>> [u' normal ']

result = u''.join(sel.xpath('//a//text()').extract())
print result
# >>> Bold normal Italist
Frank Martin
  • 2,554
  • 2
  • 22
  • 25
3

You can try to use

a/string()

or

normalize-space(a)

which returns Bold normal Italist

Andersson
  • 49,746
  • 15
  • 64
  • 117