Regex capitalize first letter every word, also after a special character like a dash

Question

I use this #(\s|^)([a-z0-9-_]+)#i for capitalize every first letter every word, i want it also to capitalize the letter if it's after a special mark like a dash(-)

Now it shows:

This Is A Test For-stackoverflow

And i want this:

This Is A Test For-Stackoverflow

Any suggestions/samples for me?

I'am not a pro, so try to keep it simple for me to understand.

Do you also need to capitalize non-ASCII letters (`à`, `ü` etc.)? What language are you using? — Tim Pietzcker, Jun 06 '11 at 13:10

score 33 · Answer 1 · answered Apr 24 '15 at 23:00

33

+1 for word boundaries, and here is a comparable Javascript solution. This accounts for possessives, as well:

var re = /(\b[a-z](?!\s))/g;
var s = "fort collins, croton-on-hudson, harper's ferry, coeur d'alene, o'fallon"; 
s = s.replace(re, function(x){return x.toUpperCase();});
console.log(s); // "Fort Collins, Croton-On-Hudson, Harper's Ferry, Coeur D'Alene, O'Fallon"

answered Apr 24 '15 at 23:00

NotNedLudd

339
3
2

toUpperCase is capitalizing the whole word. Here is the solution: s.replace(re, function(x){return x.charAt(0).toUpperCase() + x.slice(1);}); – Polopollo May 09 '16 at 20:26
2

@Polopollo, in this case the regex is only returning one letter if it matches but globally. So there is no need for that extra coding and it should work as is. – adam-beck Apr 26 '17 at 19:51
This will not work as OP has asked since a single character would not get capitalized. Just for anybody who comes to this question like I did. – adam-beck Apr 26 '17 at 19:51
1

I fear this doesn't work: word boundaries include things like '. So `don't` becomes `Don'T` – Anderas Apr 13 '18 at 05:28
@Anderas that's what the negative lookahead is for: `(?!\s)` checks if it's not a character before whitespace. On the other hand, this fails when a word like `don't` is followed by a non-whitespace, non-alphanumeric character like a comma, period or exclamation mark. It would be better to use a word boundary in the lookahead: `/(\b[a-z](?!\b))/g;` – Guido Bouman May 03 '18 at 12:22
@GuidoBouman: Your suggested regex fails for Coeur D'Alene and O'Fallon though. – davemyron May 23 '19 at 00:56

score 19 · Answer 2 · answered Jun 06 '11 at 11:42

19

A simple solution is to use word boundaries:

#\b[a-z0-9-_]+#i

Alternatively, you can match for just a few characters:

#([\s\-_]|^)([a-z0-9-_]+)#i

answered Jun 06 '11 at 11:42

Kobi

130,553
41
252
283

Thank you! Works like a charm! – Simmer Jun 06 '11 at 11:56
2

@Tim - I took artistic freedom and didn't change the way the OP matches letters - It's *possible* Simmer wants the letter as output, change their colors or whatnot. Also, didn't gave it that much thought, I only had 4 minutes `:P` – Kobi Jun 06 '11 at 14:35
1

Can someone please add jsfiddle example would be helpful – Pravin W Jun 09 '16 at 10:33
1

Which language's regex is this for? – JohnK Jun 22 '17 at 15:32
@JohnK - Both of these are simple enough and should work in all languages. `#` is a separator here, so your language may need `"\\b[a-z0-9-_]+"` and an `IgnoreCase` flag. – Kobi Jun 22 '17 at 15:44

score 7 · Answer 3 · answered Jun 06 '11 at 11:59

7

Actually dont need to match full string just match the first non-uppercase letter like this:

'~\b([a-z])~'

answered Jun 06 '11 at 11:59

anubhava

713,503
59
514
593

3

in js, i've added `g` like `/\b([a-z])/g` to capitalize each word – Stalin Gino Dec 06 '14 at 07:53
1

i like your lovely answer @StalinGino must say this is the only one i was able to understand. – Danish Feb 08 '16 at 11:38
That is as per the requirements. Check all other answers as well. – anubhava May 24 '20 at 04:08

score 6 · Answer 4 · answered Dec 17 '20 at 16:34

If you want to use pure regular expressions you must use the \u.

To transform this string:

This Is A Test For-stackoverflow

into

This Is A Test For-Stackoverflow

You must put: (.+)-(.+) to capture the values before and after the "-" then to replace it you must put:

$1-\u$2

If it is in bash you must put:

echo "This Is A Test For-stackoverflow" | sed 's/$.$-$.$/\1-\u\2/'

score 4 · Answer 5 · answered May 23 '20 at 18:46

For JavaScript, here’s a solution that works across different languages and alphabets:

const originalString = "this is a test for-stackoverflow"
const processedString = originalString.replace(/(?:^|\s|[-"'([{])+\S/g, (c) => c.toUpperCase())

It matches any non-whitespace character \S that is preceded by a the start of the string ^, whitespace \s, or any of the characters -"'([{, and replaces it with its uppercase variant.

score 2 · Answer 6 · answered Jan 22 '21 at 22:35

my solution using javascript

function capitalize(str) {
  var reg = /\b([a-zÁ-ú]{3,})/g;
  return string.replace(reg, (w) => w.charAt(0).toUpperCase() + w.slice(1));
}

with es6 + javascript

const capitalize = str => 
    str.replace(/\b([a-zÁ-ú]{3,})/g, (w) => w.charAt(0).toUpperCase() + w.slice(1));



/<expression-here>/g

[a-zÁ-ú] here I consider all the letters of the alphabet, including capital letters and with accentuation. ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas
[a-zÁ-ú]{3,} so I'm going to remove some letters that are not big enough
ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas
\b([a-zÁ-ú]{3,}) lastly i keep only words that complete which are selected. Have to use () to isolate the last expression to work.
ex: sábado de Janeiro às 19h. sexta-feira de janeiro às 21 e horas

after achieving this, I apply the changes only to the words that are in lower case

string.charAt(0).toUpperCase() + w.slice(1); // output -> Output

joining the two

str.replace(/\b(([a-zÁ-ú]){3,})/g, (w) => w.charAt(0).toUpperCase() + w.slice(1));

result:
Sábado de Janeiro às 19h. Sexta-Feira de Janeiro às 21 e Horas

score 1 · Answer 7 · answered Jan 18 '19 at 21:02

Here's my Python solution

>>> import re
>>> the_string = 'this is a test for stack-overflow'
>>> re.sub(r'(((?<=\s)|^|-)[a-z])', lambda x: x.group().upper(), the_string)
'This Is A Test For Stack-Overflow'

read about the "positive lookbehind" here: https://www.regular-expressions.info/lookaround.html

Sedecimdies · Answer 8 · 2013-09-17T10:11:04.520

this will make

R.E.A.C De Boeremeakers

from

r.e.a.c de boeremeakers

(?<=\A|[ .])(?<up>[a-z])(?=[a-z. ])

using

    Dim matches As MatchCollection = Regex.Matches(inputText, "(?<=\A|[ .])(?<up>[a-z])(?=[a-z. ])")
    Dim outputText As New StringBuilder
    If matches(0).Index > 0 Then outputText.Append(inputText.Substring(0, matches(0).Index))
    index = matches(0).Index + matches(0).Length
    For Each Match As Match In matches
        Try
            outputText.Append(UCase(Match.Value))
            outputText.Append(inputText.Substring(Match.Index + 1, Match.NextMatch.Index - Match.Index - 1))
        Catch ex As Exception
            outputText.Append(inputText.Substring(Match.Index + 1, inputText.Length - Match.Index - 1))
        End Try
    Next

Regex capitalize first letter every word, also after a special character like a dash

8 Answers8

Linked