How do I remove all email addresses and links from a string and replace them with "[removed]"
7 Answers
You can use preg_replace to do it.
for emails:
$pattern = "/[^@\s]*@[^@\s]*\.[^@\s]*/";
$replacement = "[removed]";
preg_replace($pattern, $replacement, $string);
for urls:
$pattern = "/[a-zA-Z]*[:\/\/]*[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-_]+/i";
$replacement = "[removed]";
preg_replace($pattern, $replacement, $string);
Resources
PHP manual entry: http://php.net/manual/en/function.preg-replace.php
Credit where credit is due: email regex taken from preg_match manpage, and URL regex taken from: http://www.weberdev.com/get_example-4227.html
- 4,566
- 1
- 18
- 19
-
Can you post a small sample of the text? – Josiah Jul 21 '10 at 20:00
-
It was just a random text i had. Nothing specific, just some email address and some links – JEagle Jul 21 '10 at 20:08
-
That's not right. The regex for emails would not remove punctuation like :?#$% which is not allowed in valid email addresses. Regex must remove all characters except alpanumeric and period(.). Everything else (some other characters might also be allowed, but not all!) must be removed. – ZurabWeb Nov 08 '13 at 17:18
-
Thanks Its working. Can you suggest pattern to remove 10 digit mobile number – Deepak Jul 13 '16 at 20:46
-
Not working properly. Need to be improved. – Mutatos Mar 30 '21 at 16:47
-
@Deepak try out this code sample. preg_match_all('/[+027][0-9]{10}/',$string, $output). removes any phone number that starts with a 0 2 or 7 or + – stanley mbote Oct 06 '21 at 13:42
Try this:
$patterns = array('<[\w.]+@[\w.]+>', '<\w{3,6}:(?:(?://)|(?:\\\\))[^\s]+>');
$matches = array('[email removed]', '[link removed]');
$newString = preg_replace($patterns, $matches, $stringToBeMatched);
Note: you can pass an array of patterns and matches into preg_replace instead of running it twice.
- 13,082
- 3
- 49
- 57
-
-
yes, it is not removing www.site.com :-( to remove the www.site.com we can use the pattern `$pattern = "/[a-zA-Z]*[:\/\/]*[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-_]+/i";` – Junior Mayhé Dec 12 '12 at 18:44
-
it is also useful to sanitize `preg_replace('!\s*\.\s*!', '.', 'visit today stackoverflow . com and enjoy.');` – Junior Mayhé Dec 12 '12 at 19:22
The answer I was going to upvote was deleted. It linked to a Linux Journal article Validate an E-Mail Address with PHP, the Right Way that points out what's wrong with almost every email regex anyone proposes.
The range of valid forms of an email address is much broader than most people think.
- 13,862
- 2
- 43
- 65
There are a lot of characters valid in the first local part of the email (see What characters are allowed in an email address?), so these lines would replace all valid email addresses:
<?php
$c='a-zA-Z-_0-9'; // allowed characters in domainpart
$la=preg_quote('!#$%&\'*+-/=?^_`{|}~', "/"); // additional allowed in first localpart
$email="[$c$la][$c$la\.]*[^.]@[$c]+\.[$c]+";
$t = preg_replace("/\b($email)\b/", '[removed]', $t);
// or with a link:
$t = preg_replace("/\b($email)\b/", '<a href="mailto:\1">\1</a>', $t);
# replace urls:
a='A-Za-z0-9\-_';
$t = preg_replace("/[htpsftp]+[:\/\/]+[$a]+\.+[$a\.\/%&;+~=\?#]+/i", '[removed]', $t);
This will cover most valid email addresses, be informed: removing really only all valid email addresses is a bit more complex (see How can I validate an email address using a regular expression?)
- 17,707
- 27
- 124
- 213
Pattern for Email (10x to @bromelio)
"/[^@\s]*@[^@\s\.]*\.[^@\s\.,!?]*/"
Pattern for Url
"#((?:https?|ftp)://\S+[[:alnum:]]/?)#si"
- 1,645
- 4
- 25
- 53
My answer is a slight improvement of Josiah's code. Just want to combine the two code segment as one as the preg_replace() allow that the pattern can be passed as a string or as an array.
$patterns = array();
$patterns[0] = "/[^@\s]*@[^@\s]*\.[^@\s]*/"; //removes email
$patterns[1] = "/[a-zA-Z]*[:\/\/]*[A-Za-z0-9\-_]+\.+[A-Za-z0-9\.\/%&=\?\-
_]+/i"; //removes any link
$replace = "[removed]";
$string = "Follow the link below https://stackoverlow.com/testing/preg-
match-replace-in-php or email me a sample code in my email
test@mail.com";
preg_replace($pattern,s $replacement, $string);
In the event, you want to use a different replacement text when a link is removed or the email for instance when the mail is removed you specify that [email has been removed] and [link has been removed] you can extend the above segment of the code more so on the $replacement as shown below
$replacements = array();
//replacementmessage for mails
$replacements[0] = "[Email has been removed]";
//replacementmessage for links
$replacements[1] = "[Link has been removed]";
And every other part of the code remains the same.
- 577
- 1
- 5
- 16