0

I only want to use a specific "< d i v>" of the htmls to create epub, say, all the other part is useless.

Because each of the htmls is about 20k, and there are a lot of junk informations, scripts, ads, ...

All I need the "main-content" div:

<html>
....
<div id="main-content">
....
....
....
</div>
...

</html>

I know its xpath is :

//*[@id="main-content"] 

How to use it in calibre or ebook-convert?

camino
  • 123
  • 5
  • 1
    You need to show some understanding of html to understand what has to happen here. Try editing in calibre editor to see if their search is enough. – mmmmmm Feb 28 '19 at 13:44
  • @Mark Thanks for looking into it. I update the post. – camino Mar 02 '19 at 16:28

1 Answers1

1

AFAIK, this can't be done automatically with Calibre. However, if it's possible to download parts of a website with a custom Recipe.

If you're familiar with Python, you could also use BeautifulSoup to scrape websites.

For more information, see How to scrape websites with Python and BeautifulSoup.