-1

Can someone explain why some people use this kind of symbols in their code? it's kind of confusing for me and maybe for all the newbies out there.

I am learning Python and I reached a lesson where i should build a web-crawler (spider),in the example they use only how to grab data from a specific url in a specific way i looked in the internet for a general way and found this Code with lot of symbols i know some of them but the rest i have no clue here is a chunk of the code i found with symbols

import sys, thread, Queue, re urllib, urlparse, time, os
dupcheck = set()
q=Queue.Queue(100)
q.put(sys.argv[1])
def queueURLs(html,originalink):
for url in re.findall('"'<a[^>]+href["'](.`[^"']+)["']'"', html,re.I):)

what does symbols like this ^> mean in the code i know that a stand for anchor and what href stands for but those symbols are confusing

Grzegorz Piwowarek
  • 12,338
  • 7
  • 57
  • 91
rn0rdin
  • 22
  • 4

3 Answers3

3

They are regular expressions and you should probably not be parsing HTML using them.

Noufal Ibrahim
  • 69,212
  • 12
  • 131
  • 165
0

These symbols, when used inside a string don't have a particular meaning in python.

However, they mean something while used in strings passed to modules that handle regular expressions like re.

Pierre Barre
  • 2,128
  • 1
  • 10
  • 22
0

Other answers have already alluded to the fact that the use of the 'symbols' (read: operators) are for defining regular expressions. For the line in question:

for url in re.findall('"'<a[^>]+href["'](.`[^"']+)["']'"', html,re.I):)

For regular expressions, using the ^ character inside a set definition, i.e. [^abcd] indicates a match only if the character is NOT 'a', 'b', 'c' or 'd'.

See https://docs.python.org/2/library/re.html for more information on regular expressions and their usage in Python.

theorifice
  • 659
  • 3
  • 9