13

How can I parse text and find all instances of hyperlinks with a string? The hyperlink will not be in the html format of <a href="http://test.com">test</a> but just http://test.com

Secondly, I would like to then convert the original string and replace all instances of hyperlinks into clickable html hyperlinks.

I found an example in this thread:

Easiest way to convert a URL to a hyperlink in a C# string?

but was unable to reproduce it in python :(

Community
  • 1
  • 1
TimLeung
  • 3,421
  • 6
  • 41
  • 59

4 Answers4

23

Here's a Python port of Easiest way to convert a URL to a hyperlink in a C# string?:

import re

myString = "This is my tweet check it out http://tinyurl.com/blah"

r = re.compile(r"(http://[^ ]+)")
print r.sub(r'<a href="\1">\1</a>', myString)

Output:

This is my tweet check it out <a href="http://tinyurl.com/blah">http://tinyurl.com/blah</a>
Community
  • 1
  • 1
maxyfc
  • 10,847
  • 7
  • 35
  • 46
  • 3
    It can be improved by adding support for https or ftp URLs... Also, I believe the scheme (http) is case-INsensitive. – bortzmeyer Apr 06 '09 at 08:38
  • See http://stackoverflow.com/questions/1986059/gruber-s-url-regular-expression-in-python for hopefully a better regular expression. – tripleee Oct 10 '14 at 11:27
9

Here is a much more sophisticated regexp from 2002.

@yoniLavi minified this to:

re.compile(r'\b(?:https?|telnet|gopher|file|wais|ftp):[\w/#~:.?+=&%@!\-.:?\\-]+?(?=[.:?\-]*(?:[^\w/#~:.?+=&%@!\-.:?\-]|$))')
dfrankow
  • 18,326
  • 38
  • 134
  • 193
  • 1
    I found it very useful too, and minified it to: `re.compile(r'\b(?:https?|telnet|gopher|file|wais|ftp):[\w/#~:.?+=&%@!\-.:?\\-]+?(?=[.:?\-]*(?:[^\w/#~:.?+=&%@!\-.:?\-]|$))')` – yoniLavi Apr 29 '13 at 12:43
  • 3
    Great stuff, but what if the URL does not have the http:// prefix. Usually we don't specify that part any more in emails and social media. – dlink Jan 09 '16 at 18:43
5

Django also has a solution that doesn't just use regex. It is django.utils.html.urlize(). I found this to be very helpful, especially if you happen to be using django.

You can also extract the code to use in your own project.

Erock
  • 740
  • 7
  • 9
Kekoa
  • 27,056
  • 13
  • 71
  • 90
2

Jinja2 (Flask uses this) has a filter urlize which does the same.

Docs

jmoz
  • 7,564
  • 4
  • 30
  • 33