How to remove a word prefix using grep?

Question

How can I remove the beginning of a word using grep ? Ex: I have a file that contains:

www.abc.com

I only need the part

abc.com

Sorry for the basic question. But have no experience with Linux.

Use [sed](http://www.grymoire.com/Unix/Sed.html#uh-5) instead of grep. — Piotr Praszmo, Jul 26 '12 at 16:00

sastanin · Accepted Answer · 2012-07-27T12:30:24.460

15

You don't edit strings with grep in Unix shell, grep is usually used to find or remove some lines from the text. You'd rather use sed instead:

$ echo www.example.com | sed 's/^[^\.]\+\.//'
example.com

You'll need to learn regular expressions to use it effectively.

Sed can also edit file in-place (modify the file), if you pass -i argument, but be careful, you can easily lose data if you write the wrong sed command and use -i flag.

An example

From your comments guess you have a TeX document, and your want to remove the first part of all .com domain names. If it is your document test.tex:

\documentclass{article}
\begin{document}
www.example.com
example.com www.another.domain.com
\end{document}

then you can transform it with this sed command (redirect output to file or edit in-place with -i):

$ sed 's/\([a-z0-9-]\+\.\)\(\([a-z0-9-]\+\.\)\+com\)/\2/gi' test.tex 
\documentclass{article}
\begin{document}
example.com
example.com another.domain.com
\end{document}

Please note that:

A common sequence of allowed symbols followed by a dot is matched by [a-z0-9-]\+\.
I used groups in the regular expression (parts of it within $ and $) to indicate the first and the second part of the URL, and I replace the entire match with its second group (\2 in the substitution pattern)
The domain should be at least 3rd level .com domain (every \+ repition means at least one match)
The search is case insensitive (i flag in the end)
It can do more than match per line (g flag in the end)

edited Jul 27 '12 at 12:30

answered Jul 26 '12 at 16:01

sastanin

38,876
13
98
128

The URLs are saved in a file. So my command will be: grep'\.com$' source.text >dest.tex | sed 's/^[^\.]\+\.//' ?? It gives me error ?? – Jury A Jul 26 '12 at 17:32
I also need to write the names (they are many lines not one) in another text file after removing www. – Jury A Jul 26 '12 at 17:37
I tried to guess what's your task and wrote an example of a `sed` regex to edit domain names in the document, without touching the rest of the lines. If your problem is different you may need a different regex, but overall the idea is the same. – sastanin Jul 27 '12 at 12:33
Normally you either redirect to file (`> dest.tex`), or just use pipe (`| sed ...`), but not both. You don't need `grep` if you want to change some lines but keep the rest. A carefully written regex and `sed` is probably all you need. – sastanin Jul 27 '12 at 12:34
On macOS, the `sed` command does not work the same as the Linux version. But you could use this simpler version on the Mac, without regular expressions: `echo www.example.com | sed "s/www.//"` -- It will replace `"www."` with empty string `""`. – Mr-IDE Oct 16 '19 at 10:38

score 7 · Answer 2 · answered Jul 26 '12 at 16:34

As the others have noted, grep is not well suited for this task, sed is a good option, or if the text is well ordered a simple cut might be easier to type:

echo www.abc.com | cut -d. -f2-

-d. tells cut to use . as a delimiter.
-f2- tells cut to return field 2 to infinity.

score 7 · Answer 3 · answered Jul 26 '12 at 18:42

You can do this using grep easily:

$ echo www.google.com | grep -o '[^.]*\.com'
google.com

Instead of echo you must give your file.

$ grep -o '[^.]*\.com$' < file

I used here the regular expression '[^.]*.com'. That means: find me a word without . in it ([^.]*), after which goes .com (\.com in re). The -o key says that grep must show only that part that was found.

Matthias Braun · Answer 4 · 2021-08-05T17:54:18.363

with grep's `--only-matching` and `\K`

You can do this with a grep's --only-matching flag:

echo "www.abc.com" | grep --perl-regexp --only-matching 'www.\K.*'

which can be shortened to

echo "www.abc.com" | grep -Po 'www.\K.*'

Both commands produce

abc.com

with grep (GNU grep) 3.3.

Instead of echo, I'll use a here string to shorten the command further:

grep -Po 'www.\K.*' <<< "www.abc.com"

\K resets the starting point of the match, essentially forgetting the matched "www.". See this for more on \K.

with grep's positive lookbehind

You can also do this with a positive lookbehind:

grep -Po '(?<=www.).*' <<< "www.abc.com"

with awk's field separator `-F`

awk -F 'www.' <<< "www.abc.com" '$2{print $2}'

This prints

abc.com

The $2{print $2} part will print the second field if it's defined. This is necessary in case of multi-line input to avoid outputting blank lines for input lines that don't contain the field separator.

score 3 · Answer 5 · answered Jul 26 '12 at 16:00

3

grep is not used to manipulate/change text, only to search for text/patterns within text

You should look into something like sed or awk or cut if you want a command line tool to do it. Or write a script in Python/Perl/Ruby/whatever.

answered Jul 26 '12 at 16:00

Daniel DiPaolo

53,439
13
112
113

score 1 · Answer 6 · edited Jan 02 '18 at 07:59

1

You can actually do this without invoking other programs, by using a builtin parameter expansion in bash:

while read line; do echo ${line#*.}; done < file

Where #*. tells the shell to remove the prefix that looks like 0 or more characters followed by a ..

You can view a cheatsheet with the different parameter expansions for bash here:

https://devhints.io/bash

edited Jan 02 '18 at 07:59

clemens

15,334
11
41
58

answered Jan 02 '18 at 07:35

Fahd Ahmed

11
1

How to remove a word prefix using grep?

6 Answers6

An example

with grep's --only-matching and \K

with grep's positive lookbehind

with awk's field separator -F

with grep's `--only-matching` and `\K`

with awk's field separator `-F`