1

I know there are easier ways to get file extensions with JavaScript, but partly to practice my regexp skills I wanted to try and use a regular expression to split a filename into two strings, before and after the final dot (. character).

Here's what I have so far

const myRegex = /^((?:[^.]+(?:\.)*)+?)(\w+)?$/
const [filename1, extension1] = 'foo.baz.bing.bong'.match(myRegex);
// filename1 = 'foo.baz.bing.'
// extension1 = 'bong'
const [filename, extension] = 'one.two'.match(myRegex);
// filename2 = 'one.'
// extension2 = 'two'
const [filename, extension] = 'noextension'.match(myRegex);
// filename2 = 'noextension'
// extension2 = ''

I've tried to use negative lookahead to say 'only match a literal . if it's followed by a word that ends in, like so, by changing (?:\.)* to (?:\.(?=\w+.))*:

/^((?:[^.]+(?:\.(?=(\w+\.))))*)(\w+)$/gm

But I want to exclude that final period using just the regexp, and preferably have 'noextension' be matched in the initial group, how can I do that with just regexp?

Here is my regexp scratch file: https://regex101.com/r/RTPRNU/1

Peter Seliger
  • 8,830
  • 2
  • 27
  • 33
AncientSwordRage
  • 6,865
  • 14
  • 82
  • 158

3 Answers3

1

For the first capture group, you could start the match with 1 or more word characters. Then optionally repeat a . and again 1 or more word characters.

Then you can use an optional non capture group matching a . and capturing 1 or more word characters in group 2.

As the second non capture group is optional, the first repetition should be on greedy.

^(\w+(?:\.\w+)*?)(?:\.(\w+))?$

The pattern matches

  • ^ Start of string
  • ( Capture group 1
    • \w+(?:\.\w+)*? Match 1+ word characters, and optionally repeat . and 1+ word characters
  • ) Close group 1
  • (?: Non capture group to match as a whole
    • \.(\w+) Match a . and capture 1+ word chars in capture group 2
  • )? Close non capture group and make it optional
  • $ End of string

Regex demo

const regex = /^(\w+(?:\.\w+)*?)(?:\.(\w+))?$/;
[
  "foo.baz.bing.bong",
  "one.two",
  "noextension"
].forEach(s => {
  const m = s.match(regex);
  if (m) {
    console.log(m[1]);
    console.log(m[2]);
    console.log("----");
  }
});

Another option as @Wiktor Stribiżew posted in the comments, is to use a non greedy dot to match any character for the filename:

^(.*?)(?:\.(\w+))?$

Regex demo

The fourth bird
  • 127,136
  • 16
  • 45
  • 63
0

If you really want to use regex, I would suggest to use two regex:

// example with 'foo.baz.bing.bong'

const firstString = /^.+(?=\.\w+)./g // match 'foo.baz.bing.' 
const secondString = /\w+$/g   // match 'bong'
0

How about something more explicit and accurate without looking around ...

  1. named groups variant ... /^(?<noextension>\w+)$|(?<filename>\w+(?:\.\w+)*)\.(?<extension>\w+)$/

  2. without named groups ... /^(\w+)$|(\w+(?:\.\w+)*)\.(\w+)$/

Both of the just shown variants can be shortened to 2 capture groups instead of the above variant's 3 capture groups, which in my opinion makes the regex easier to work with at the cost of being less readable ...

  1. named groups variant ... /(?<filename>\w+(?:\.\w+)*?)(?:\.(?<extension>\w+))?$/

  2. without named groups ... /(\w+(?:\.\w+)*?)(?:\.(\w+))?$/

const testData = [
  'foo.baz.bing.bong',
  'one.two',
  'noextension',
];
// https://regex101.com/r/RTPRNU/5
const regXTwoNamedFileNameCaptures = /(?<filename>\w+(?:\.\w+)*?)(?:\.(?<extension>\w+))?$/;
// https://regex101.com/r/RTPRNU/4
const regXTwoFileNameCaptures = /(\w+(?:\.\w+)*?)(?:\.(\w+))?$/;

// https://regex101.com/r/RTPRNU/3
const regXThreeNamedFileNameCaptures = /^(?<noextension>\w+)$|(?<filename>\w+(?:\.\w+)*)\.(?<extension>\w+)$/
// https://regex101.com/r/RTPRNU/3
const regXThreeFileNameCaptures = /^(\w+)$|(\w+(?:\.\w+)*)\.(\w+)$/

console.log(
  'based on 2 named file name captures ...\n',
  testData, ' =>',
  testData.map(str =>
    regXTwoNamedFileNameCaptures.exec(str)?.groups ?? {}
  )
);
console.log(
  'based on 2 unnamed file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const [
      match,
      filename,
      extension,
    ] = str.match(regXTwoFileNameCaptures) ?? [];
  //] = regXTwoFileNameCaptures.exec(str) ?? [];

    return {
      filename,
      extension,
    }
  })
);

console.log(
  'based on 3 named file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const {
      filename = '',
      extension = '',
      noextension = '',
    } = regXThreeNamedFileNameCaptures.exec(str)?.groups ?? {};

    return {
      filename: filename || noextension,
      extension,
    }
  })
);
console.log(
  'based on 3 unnamed file name captures ...\n',
  testData, ' =>',
  testData.map(str => {
    const [
      match,
      noextension = '',
      filename = '',
      extension = '',
    ] = str.match(regXThreeFileNameCaptures) ?? [];
  //] = regXThreeFileNameCaptures.exec(str) ?? [];

    return {
      filename: filename || noextension,
      extension,
    }
  })
);
.as-console-wrapper { min-height: 100%!important; top: 0; }
Peter Seliger
  • 8,830
  • 2
  • 27
  • 33