2

I'm trying to match all anchors on a page that link to files with certain extensions, but the code is also catching cases where the URL ends with the extension text, but doesn't contain the actual extension (no period).

For example, it should (and does) match http://example.com/image.jpg, but it's also matching http://example.com/imagejpg (which I don't want it to).

The code used is:

var imageExtensions = "jpg|jpeg|gif|png|svg";

var allAnchors = document.getElementsByTagName("a");
for(var i = 0; i < allAnchors.length; i++) {
    var anchorWithImage = allAnchors[i];

    var matcher = new RegExp(".*\.(" + imageExtensions + ")$");
    if(matcher.test(anchorWithImage.href)) {
        alert(anchorWithImage.href);
    }
}

I'm under the impression that this should require that the extension text is at the end of the string, a literal period is before the extension, and there can be anything before the extension. I don't see why the literal period is being ignored.

For real test data, I was running this script against http://www.reddit.com/r/gifs/comments/2bis0x/holy_shit_greg/ and it matches http://www.reddit.com/r/makemeagif, which doesn't have a literal period. Running these links against this Regex tester has the expected results.

Kat
  • 4,605
  • 3
  • 27
  • 79

2 Answers2

3

As you're using the RegExp constructor, you're passing a string to build the regex. Your backslash to escape the dot in the expression is being eaten as an escapement in the string. The solution is to escape the dot with a double backslash:

var matcher = new RegExp(".*\\.(" + imageExtensions + ")$");

Now the slash is escaped in the string, letting it make it through the parser into the RegExp constructor to escape the dot.

JAAulde
  • 18,787
  • 5
  • 52
  • 61
  • 1
    That's indeed the issue. Good catch. The slash has to be escaped. I incorrectly presumed that invalid escape sequences (like `\.`) would be treated as literal. – Kat Jul 24 '14 at 03:02
-1

Another version of your code :)

var a = document.getElementsByTagName("A");
var pattern = /^(http:\/\/)?(.*)\.(jpg|jpeg|png|gif|svg)$/gi;

for(var i = 0; i < a.length; i++){
    if(a[i].href.match(pattern))
        alert(a[i].href);
}
hex494D49
  • 8,687
  • 3
  • 36
  • 46
  • Although this one won't work with local files, which is a bit of an artificial limitation. Also, I wanted to extract the list of extensions into its own string, as I plan to make it configurable later. – Kat Jul 24 '14 at 03:25
  • @Mike I've updated the snippet above so it will match local files as well :) – hex494D49 Jul 24 '14 at 09:57