how to extract specific tag using calibre

Question

I only want to use a specific "< d i v>" of the htmls to create epub, say, all the other part is useless.

Because each of the htmls is about 20k, and there are a lot of junk informations, scripts, ads, ...

All I need the "main-content" div:

<html>
....
<div id="main-content">
....
....
....
</div>
...

</html>

I know its xpath is :

//*[@id="main-content"]

How to use it in calibre or ebook-convert?

You need to show some understanding of html to understand what has to happen here. Try editing in calibre editor to see if their search is enough. — mmmmmm, Feb 28 '19 at 13:44

score 1 · Accepted Answer · 2019-03-03T11:40:36.333

1

AFAIK, this can't be done automatically with Calibre. However, if it's possible to download parts of a website with a custom Recipe.

If you're familiar with Python, you could also use BeautifulSoup to scrape websites.

edited Mar 03 '19 at 11:40

answered Mar 03 '19 at 11:02

1 Answers1