Get HTML source of WebElement in Selenium WebDriver using Python

Question

I'm using the Python bindings to run Selenium WebDriver:

from selenium import webdriver
wd = webdriver.Firefox()

I know I can grab a webelement like so:

elem = wd.find_element_by_css_selector('#my-id')

And I know I can get the full page source with...

wd.page_source

But is there a way to get the "element source"?

elem.source   # <-- returns the HTML as a string

The Selenium WebDriver documentation for Python are basically non-existent and I don't see anything in the code that seems to enable that functionality.

What is the best way to access the HTML of an element (and its children)?

You also could just parse all the `wd.page_source` with beautifulsoup — eLRuLL, Mar 01 '13 at 13:59

score 926 · Accepted Answer · edited Dec 09 '20 at 17:21

926

You can read the innerHTML attribute to get the source of the content of the element or outerHTML for the source with the current element.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

C#:

element.GetAttribute("innerHTML");

Ruby:

element.attribute("innerHTML")

JavaScript:

element.getAttribute('innerHTML');

PHP:

$element->getAttribute('innerHTML');

It was tested and worked with the ChromeDriver.

edited Dec 09 '20 at 17:21

Peter Mortensen

30,030
21
100
124

answered Dec 20 '11 at 12:49

Nerijus

9,332
1
13
2

12

innerHTML is a not DOM attribute. So above answer wouldn't work. innerHTML is a javascript javascript value. Doing above would return null. The answer by nilesh is the proper answer. – Bibek Shrestha Mar 22 '12 at 13:53
7

This works great for me, and is much more elegant than the accepted answer. I'm using Selenium 2.24.1. – Ryan Shillington Jul 10 '12 at 02:04
25

Though innerHTML is not a DOM attribute, it is well supported by all major browsers (http://www.quirksmode.org/dom/w3c_html.html). It works also well for me. – CuongHuyTo Jul 23 '12 at 10:57
3

+1 This appears to work in ruby also. I have a feeling that the `getAttribute` method (or equivalent in other languages) just calls the js method whose name is the arg. However the documentation doesn't explicitly say this, so nilesh's solution should be a fallback. – Kelvin Aug 20 '12 at 19:45
I'm getting this: content.get_attribute('innerHTML') == u'
...
' – Andrew B. Aug 30 '12 at 20:00
29

**This fails for `HtmlUnitDriver`.** Works for `ChromeDriver`, `FirefoxDriver`, `InternetExplorerDriver` (IE10) and `PhantomJSDriver` (I haven't tested others). – acdcjunior May 22 '14 at 20:54
@acdcjunior - HtmlUnit's javascript support is pretty weak; I'd imagine by extension they haven't supported this. More info [at this thread](http://htmlunit.10904.n7.nabble.com/How-to-get-inner-HTML-of-an-html-element-td25936.html) – Momer Jun 17 '14 at 20:12
In Ruby it `element.attribute("innerHTML")` if anyone needs it. – mvndaai Nov 07 '14 at 20:36
nice- elem.get_attribute("innerHTML") It works by using this. – Bharat Mane Apr 26 '16 at 12:43
2

in Node: `element.getAttribute('innerHTML')` – ShaBANG Oct 10 '16 at 19:36
I have a situation where an alert text is displayed and I want to catch. Neither `driver.page_source` nor `element.get_attribute('innerHTML')` can return contents containing that alert text. In Firefox,only "`inspect elements |right click |copy|inner HTML`" give me what I need. How can I mimic that action chain in Selenium Python? – Heinz Mar 26 '20 at 22:10

score 96 · Answer 2 · edited Dec 09 '20 at 17:24

96

There is not really a straightforward way of getting the HTML source code of a webelement. You will have to use JavaScript. I am not too sure about python bindings, but you can easily do like this in Java. I am sure there must be something similar to JavascriptExecutor class in Python.

 WebElement element = driver.findElement(By.id("foo"));
 String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);

edited Dec 09 '20 at 17:24

Peter Mortensen

30,030
21
100
124

answered Sep 03 '11 at 03:29

nilesh

13,751
6
62
77

1

This is essentially what I ended up doing, albeit with the Python equivalent. – Chris W. Sep 07 '11 at 21:15
8

I think the answer below, using element.getAttribute("innerHTML") is a lot easier to read. I don't understand why people are voting it down. – Ryan Shillington Jul 10 '12 at 02:05
1

No need to call javascript at all. In Python just use element.get_attribute('innerHTML') – Anthon Apr 30 '14 at 08:15
6

@Anthon `innerHTML`is not a DOM attribute. When I answered this question in 2011, it did not work for me, looks like now some browsers are supporting it. If it works for you then using `innerHTML` is cleaner. However there is no guarantee it will work on all browsers. – nilesh Apr 30 '14 at 13:25
2

Apparently, this is the only way to get innerHTML while using RemoteWebDriver – Illidan Jun 06 '15 at 14:29

score 85 · Answer 3 · edited Jan 10 '20 at 19:40

85

Sure we can get all HTML source code with this script below in Selenium Python:

elem = driver.find_element_by_xpath("//*")
source_code = elem.get_attribute("outerHTML")

If you you want to save it to file:

with open('c:/html_source_code.html', 'w') as f:
    f.write(source_code.encode('utf-8'))

I suggest saving to a file because source code is very very long.

edited Jan 10 '20 at 19:40

Samuel RIGAUD

304
4
10

answered Mar 20 '13 at 18:08

Mark

959
6
4

2

Can I set a delay and get the latest source? There are dynamic contents loaded using javascript. – CodeGuru Oct 17 '13 at 23:41
Does this work even if the page is not fully loaded? Also, is there any way to set a delay like @FlyingAtom mentioned? – TheRookierLearner Oct 20 '14 at 16:01
If Webpage contain dynamic contents then it depends upon behavior of that webpage but 90%, you had to set delay before getting raw HTML from that page. And most simplest way is ```time.sleep(x) # Where x is seconds``` to set delay. – Parampreet Rai Jan 04 '21 at 12:25

score 14 · Answer 4 · edited Oct 24 '14 at 19:11

14

In Ruby, using selenium-webdriver (2.32.1), there is a page_source method that contains the entire page source.

edited Oct 24 '14 at 19:11

Ajinkya

21,823
33
109
156

answered Apr 15 '13 at 20:59

John Alberts

249
2
4

undetected Selenium · Answer 5 · 2020-12-09T19:12:20.230

The other answers provide a lot of details about retrieving the markup of a WebElement. However, an important aspect is, modern websites are increasingly implementing JavaScript, ReactJS, jQuery, Ajax, Vue.js, Ember.js, GWT, etc. to render the dynamic elements within the DOM tree. Hence there is a necessity to wait for the element and its children to completely render before retrieving the markup.

Python

Hence, ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using get_attribute("outerHTML"):

element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#my-id")))
print(element.get_attribute("outerHTML"))

Using execute_script():

element = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "#my-id")))
print(driver.execute_script("return arguments[0].outerHTML;", element))

Note: You have to add the following imports:

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

score 6 · Answer 6 · edited Dec 09 '20 at 17:26

6

Using the attribute method is, in fact, easier and more straightforward.

Using Ruby with the Selenium and PageObject gems, to get the class associated with a certain element, the line would be element.attribute(Class).

The same concept applies if you wanted to get other attributes tied to the element. For example, if I wanted the string of an element, element.attribute(String).

edited Dec 09 '20 at 17:26

Peter Mortensen

30,030
21
100
124

answered Mar 22 '13 at 15:46

Tiffany G

203
3
9

score 6 · Answer 7 · edited Dec 09 '20 at 17:28

6

It looks outdated, but let it be here anyway. The correct way to do it in your case:

elem = wd.find_element_by_css_selector('#my-id')
html = wd.execute_script("return arguments[0].innerHTML;", elem)

or

html = elem.get_attribute('innerHTML')

Both are working for me (selenium-server-standalone-2.35.0).

edited Dec 09 '20 at 17:28

Peter Mortensen

30,030
21
100
124

answered Mar 06 '14 at 14:52

nefski

611
6
12

score 4 · Answer 8 · answered Mar 29 '16 at 21:25

4

Java with Selenium 2.53.0

driver.getPageSource();

answered Mar 29 '16 at 21:25

WltrRpo

263
2
13

that's not what the question asked for – Corey Goldberg May 31 '17 at 02:18
Depending on the webdriver, the `getPageSource` method may not return the actual page source (ie with possible javascript changements). The returned source may be the raw source sent by the server. The webdriver doc must be checked to ensure this point. – Stephan Jul 25 '17 at 06:23
Also works for php - `$driver->getPageSource()` – wowandy Dec 21 '21 at 18:45

score 3 · Answer 9 · edited Dec 09 '20 at 17:31

InnerHTML will return the element inside the selected element and outerHTML will return the inside HTML along with the element you have selected

Example:

Now suppose your Element is as below

<tr id="myRow"><td>A</td><td>B</td></tr>

innerHTML element output

<td>A</td><td>B</td>

outerHTML element output

<tr id="myRow"><td>A</td><td>B</td></tr>

Live Example:

http://www.java2s.com/Tutorials/JavascriptDemo/f/find_out_the_difference_between_innerhtml_and_outerhtml_in_javascript_example.htm

Below you will find the syntax which require as per different binding. Change the innerHTML to outerHTML as per required.

Python:

element.get_attribute('innerHTML')

Java:

elem.getAttribute("innerHTML");

If you want whole page HTML, use the below code:

driver.getPageSource();

score 2 · Answer 10 · edited Dec 09 '20 at 17:39

2

The method to get the rendered HTML I prefer is the following:

driver.get("http://www.google.com")
body_html = driver.find_element_by_xpath("/html/body")
print body_html.text

However, the above method removes all the tags (yes, the nested tags as well) and returns only text content. If you interested in getting the HTML markup as well, then use the method below.

print body_html.getAttribute("innerHTML")

edited Dec 09 '20 at 17:39

Peter Mortensen

30,030
21
100
124

answered Feb 04 '18 at 17:32

Rusty

3,429
3
30
37

1

You can also use driver.find_element_by_tag("body") to reach the body content of the page. – Rusty Feb 05 '18 at 04:58

score 2 · Answer 11 · edited Sep 22 '19 at 16:05

2

This works seamlessly for me.

element.get_attribute('innerHTML')

edited Sep 22 '19 at 16:05

MaartenDev

5,093
5
20
31

answered Sep 22 '19 at 15:26

Jitendra Pisal

91
6

score 2 · Answer 12 · edited Apr 07 '16 at 23:09

2

I hope this could help: http://selenium.googlecode.com/svn/trunk/docs/api/java/org/openqa/selenium/WebElement.html

Here is described Java method:

java.lang.String    getText()

But unfortunately it's not available in Python. So you can translate the method names to Python from Java and try another logic using present methods without getting the whole page source...

E.g.

 my_id = elem[0].get_attribute('my-id')

edited Apr 07 '16 at 23:09

Phillip

2,127
3
24
39

answered Sep 07 '11 at 14:23

oleksii.burdin

53
4

7

Python actually does have a "gettext" equivalent (I think its just the "text" attribute?) but that actually just returns the "plaintext" between HTML tags and won't actually return the full HTML source. – Chris W. Sep 07 '11 at 21:17
2

This returns only the plain text (not the html) in Java too. – Ryan Shillington Jul 10 '12 at 02:06
you must reference it like you said elem[0] otherwise it doesn't work – HelloW Sep 12 '13 at 18:17

score 0 · Answer 13 · edited Dec 09 '20 at 17:27

0

If you are interested in a solution for Selenium Remote Control in Python, here is how to get innerHTML:

innerHTML = sel.get_eval("window.document.getElementById('prodid').innerHTML")

edited Dec 09 '20 at 17:27

Peter Mortensen

30,030
21
100
124

answered Jul 09 '13 at 14:18

StanleyD

2,218
21
20

Thanks for the help, I have used this. I also find `innerHTML = {solenium selector code}.text` works just the same. – Shane Aug 04 '13 at 00:01

score 0 · Answer 14 · edited Dec 09 '20 at 17:29

0

And in PHPUnit Selenium test it's like this:

$text = $this->byCssSelector('.some-class-nmae')->attribute('innerHTML');

edited Dec 09 '20 at 17:29

Peter Mortensen

30,030
21
100
124

answered May 30 '14 at 10:25

Zorgijs

29

score 0 · Answer 15 · answered Sep 11 '21 at 02:49

Use execute_script get html

bs4(BeautifulSoup) also can access html tag quickly.

from bs4 import BeautifulSoup
html = adriver.execute_script("return document.documentElement.outerHTML")
bs4_onepage_object=BeautifulSoup(html,"html.parser")
bs4_div_object=bs4_onepage_object.find_all("atag",class_="attribute")

score 0 · Answer 16 · answered Oct 25 '21 at 12:10

0

In current versions of php-webdriver (1.12.0+) you to use

$element->getDomProperty('innerHTML');

as pointed out in this issue: https://github.com/php-webdriver/php-webdriver/issues/929

answered Oct 25 '21 at 12:10

christian

162
7

score 0 · Answer 17 · answered Dec 22 '21 at 07:51

0

In PHP Selenium WebDriver you can get page source like this:

$html = $driver->getPageSource();

Or get HTML of the element like this:

// innerHTML if you need HTML of the element content
$html = $element->getDomProperty('outerHTML');

answered Dec 22 '21 at 07:51

wowandy

908
1
6
19

score -1 · Answer 18 · edited Jul 09 '20 at 05:03

-1

WebElement element = driver.findElement(By.id("foo"));
String contents = (String)((JavascriptExecutor)driver).executeScript("return arguments[0].innerHTML;", element);

This code really works to get JavaScript from source as well!

edited Jul 09 '20 at 05:03

Dima Tisnek

10,388
4
61
113

answered Aug 31 '12 at 04:04

Ilya

9

Get HTML source of WebElement in Selenium WebDriver using Python

18 Answers18

Python

innerHTML element output

outerHTML element output

Linked

Related