Hello I want to Compare two webpages using python script. how can i achieve it? thanks in advance!
Asked
Active
Viewed 6,996 times
3
-
1what do you want to compare? Do you just want to know whether they are exactly the same? Or whether they look the same? – Sören Mar 08 '11 at 16:50
2 Answers
4
First, you want to retrieve both webpages. You can use wget, urlretrieve, etc.:
wget Vs urlretrieve of python
Second, you want to "compare" the pages. You can use a "diff" tool as Chinmay noted. You can also do a keyword analysis of the two pages:
- Parse all keywords from page. e.g. How do I extract keywords used in text?
- Optionally take the "stem" of the words with something like:
http://pypi.python.org/pypi/stemming/1.0 - Use some math to compare the two pages' keywords, e.g. term frequency–inverse document frequency: http://en.wikipedia.org/wiki/Tf%E2%80%93idf with some of the python tools out there like these: http://wiki.python.org/moin/InformationRetrieval
Community
- 1
- 1
Dolan Antenucci
- 14,712
- 16
- 70
- 98
2
What do you mean by compare? If you just want to find the differences between two files, try difflib, which is part of the standard Python library.
Chinmay Kanchi
- 58,811
- 22
- 84
- 113