411

I need to extract from a string a set of characters which are included between two delimiters, without returning the delimiters themselves.

A simple example should be helpful:

Target: extract the substring between square brackets, without returning the brackets themselves.

Base string: This is a test string [more or less]

If I use the following reg. ex.

\[.*?\]

The match is [more or less]. I need to get only more or less (without the brackets).

Is it possible to do it?

Tim
  • 39,651
  • 17
  • 123
  • 137
Diego
  • 6,814
  • 5
  • 29
  • 37

13 Answers13

625

Easy done:

(?<=\[)(.*?)(?=\])

Technically that's using lookaheads and lookbehinds. See Lookahead and Lookbehind Zero-Width Assertions. The pattern consists of:

  • is preceded by a [ that is not captured (lookbehind);
  • a non-greedy captured group. It's non-greedy to stop at the first ]; and
  • is followed by a ] that is not captured (lookahead).

Alternatively you can just capture what's between the square brackets:

\[(.*?)\]

and return the first captured group instead of the entire match.

jottr
  • 3,168
  • 3
  • 26
  • 34
cletus
  • 599,013
  • 161
  • 897
  • 938
  • 216
    "Easy done", LOL! :) Regular expressions always give me headache, I tend to forget them as soon as I find the ones that solve my problems. About your solutions: the first works as expected, the second doesn't, it keeps including the brackets. I'm using C#, maybe the RegEx object has its own "flavour" of regex engine... – Diego Sep 21 '09 at 15:15
  • 6
    It's doing that because you're looking at the whole match rather than the first matched group. – cletus Sep 21 '09 at 15:35
  • 2
    Does this work if the substring also contains the delimiters? For example in `This is a test string [more [or] less]` would this return `more [or] less` ? – gnzlbg Feb 22 '13 at 18:49
  • 1
    @gnzlbg no, it would return "more [or" – MerickOWA Jul 10 '13 at 21:32
  • This is returning the string along with the begin and end string – rajibdotnet Jan 30 '14 at 22:06
  • I needed something like this to find where class="border" was used in my code.. but even if preceded or followed by other classes. This helped get me where I needed, thanks! – SgtPooki Jul 23 '14 at 21:39
  • The second code `\[(.*?)\]` worked like a charm. Pretty cool this regular expression. – Francisco Quintero May 28 '15 at 16:04
  • `Invalid regular expression: /(?<=[)(.*?)(?=])/: Invalid group` – Yeats Dec 02 '16 at 05:08
  • And the second one catches regular parentheses not square brackets. Sad! – Yeats Dec 02 '16 at 05:08
  • @cletus if you have multiple placeholders, how would you get all of them in one shot, or does one have to iterate over the string using this regex? `while(found = regex.exec("This is a [test] string [more or less]"))` – Legends Feb 17 '17 at 16:15
  • @cletus I like your alternate method of returning the capture group. It helped me big time [thank you]. – nam Mar 23 '20 at 20:02
  • Be careful, lookbehinds are not supported by safari, so this can crash your app on safari. https://caniuse.com/?search=lookbehind – Maxstgt Jul 22 '21 at 15:21
72

If you are using JavaScript, the solution provided by cletus, (?<=\[)(.*?)(?=\]) won't work because JavaScript doesn't support the lookbehind operator.

Edit: actually, now (ES2018) it's possible to use the lookbehind operator. Just add / to define the regex string, like this:

var regex = /(?<=\[)(.*?)(?=\])/;

Old answer:

Solution:

var regex = /\[(.*?)\]/;
var strToMatch = "This is a test string [more or less]";
var matched = regex.exec(strToMatch);

It will return:

["[more or less]", "more or less"]

So, what you need is the second value. Use:

var matched = regex.exec(strToMatch)[1];

To return:

"more or less"
Zanon
  • 26,047
  • 20
  • 108
  • 119
  • 6
    what if there are multiple matches of [more or less] in the string? –  Feb 18 '19 at 06:09
  • Lookbehind assertions have been [**added to RegExp in ES2018**](http://2ality.com/2017/05/regexp-lookbehind-assertions.html) – Chunky Chunk May 23 '19 at 17:12
23

You just need to 'capture' the bit between the brackets.

\[(.*?)\]

To capture you put it inside parentheses. You do not say which language this is using. In Perl for example, you would access this using the $1 variable.

my $string ='This is the match [more or less]';
$string =~ /\[(.*?)\]/;
print "match:$1\n";

Other languages will have different mechanisms. C#, for example, uses the Match collection class, I believe.

cletus
  • 599,013
  • 161
  • 897
  • 938
Xetius
  • 42,267
  • 24
  • 82
  • 118
  • Thanks, but this solution didn't work, it keeps including the square brackets. As I wrote in my comment to Cletus' solution, it could be that C# RegEx object interprets it differently. I'm not expert on C# though, so it's just a conjecture, maybe it's just my lack of knowledge. :) – Diego Sep 21 '09 at 15:17
20

Here's a general example with obvious delimiters (X and Y):

(?<=X)(.*?)(?=Y)

Here it's used to find the string between X and Y. Rubular example here, or see image:

enter image description here

stevec
  • 27,285
  • 13
  • 133
  • 181
17

[^\[] Match any character that is not [.

+ Match 1 or more of the anything that is not [. Creates groups of these matches.

(?=\]) Positive lookahead ]. Matches a group ending with ] without including it in the result.

Done.

[^\[]+(?=\])

Proof.

http://regexr.com/3gobr

Similar to the solution proposed by null. But the additional \] is not required. As an additional note, it appears \ is not required to escape the [ after the ^. For readability, I would leave it in.

Does not work in the situation in which the delimiters are identical. "more or less" for example.

Stieneee
  • 171
  • 1
  • 3
  • 1
    This is a good solution, however I have made a tweak so that it ignores an extra ']' at the end as well: `[^\[\]]+(?=\])` – SteveEng Feb 18 '21 at 19:01
8

PHP:

$string ='This is the match [more or less]';
preg_match('#\[(.*)\]#', $string, $match);
var_dump($match[1]);
realloc
  • 157
  • 6
powtac
  • 39,317
  • 26
  • 112
  • 166
8

Most updated solution

If you are using Javascript, the best solution that I came up with is using match instead of exec method. Then, iterate matches and remove the delimiters with the result of the first group using $1

const text = "This is a test string [more or less], [more] and [less]";
const regex = /\[(.*?)\]/gi;
const resultMatchGroup = text.match(regex); // [ '[more or less]', '[more]', '[less]' ]
const desiredRes = resultMatchGroup.map(match => match.replace(regex, "$1"))
console.log("desiredRes", desiredRes); // [ 'more or less', 'more', 'less' ]

As you can see, this is useful for multiple delimiters in the text as well

Luis Febro
  • 1,416
  • 11
  • 21
6

This one specifically works for javascript's regular expression parser /[^[\]]+(?=])/g

just run this in the console

var regex = /[^[\]]+(?=])/g;
var str = "This is a test string [more or less]";
var match = regex.exec(str);
match;
null
  • 71
  • 1
  • 6
5

To remove also the [] use:

\[.+\]
Cătălin Rădoi
  • 1,643
  • 19
  • 38
4

I had the same problem using regex with bash scripting. I used a 2-step solution using pipes with grep -o applying

 '\[(.*?)\]'  

first, then

'\b.*\b'

Obviously not as efficient at the other answers, but an alternative.

A. Jesús
  • 83
  • 4
4

I wanted to find a string between / and #, but # is sometimes optional. Here is the regex I use:

  (?<=\/)([^#]+)(?=#*)
techguy2000
  • 3,893
  • 3
  • 29
  • 42
1

Here is how I got without '[' and ']' in C#:

var text = "This is a test string [more or less]";

// Getting only string between '[' and ']'
Regex regex = new Regex(@"\[(.+?)\]");
var matchGroups = regex.Matches(text);

for (int i = 0; i < matchGroups.Count; i++)
{
    Console.WriteLine(matchGroups[i].Groups[1]);
}

The output is:

more or less
Audwin Oyong
  • 2,061
  • 3
  • 10
  • 31
Jamaxack
  • 2,372
  • 2
  • 23
  • 40
-1

If you need extract the text without the brackets, you can use bash awk

echo " [hola mundo] " | awk -F'[][]' '{print $2}'

result:

hola mundo

Nico
  • 808
  • 9
  • 19