14

How can I remove the beginning of a word using grep ? Ex: I have a file that contains:

www.abc.com

I only need the part

abc.com

Sorry for the basic question. But have no experience with Linux.

Jotne
  • 39,326
  • 11
  • 49
  • 54
Jury A
  • 17,906
  • 24
  • 67
  • 88

6 Answers6

15

You don't edit strings with grep in Unix shell, grep is usually used to find or remove some lines from the text. You'd rather use sed instead:

$ echo www.example.com | sed 's/^[^\.]\+\.//'
example.com

You'll need to learn regular expressions to use it effectively.

Sed can also edit file in-place (modify the file), if you pass -i argument, but be careful, you can easily lose data if you write the wrong sed command and use -i flag.

An example

From your comments guess you have a TeX document, and your want to remove the first part of all .com domain names. If it is your document test.tex:

\documentclass{article}
\begin{document}
www.example.com
example.com www.another.domain.com
\end{document}

then you can transform it with this sed command (redirect output to file or edit in-place with -i):

$ sed 's/\([a-z0-9-]\+\.\)\(\([a-z0-9-]\+\.\)\+com\)/\2/gi' test.tex 
\documentclass{article}
\begin{document}
example.com
example.com another.domain.com
\end{document}

Please note that:

  • A common sequence of allowed symbols followed by a dot is matched by [a-z0-9-]\+\.
  • I used groups in the regular expression (parts of it within \( and \)) to indicate the first and the second part of the URL, and I replace the entire match with its second group (\2 in the substitution pattern)
  • The domain should be at least 3rd level .com domain (every \+ repition means at least one match)
  • The search is case insensitive (i flag in the end)
  • It can do more than match per line (g flag in the end)
sastanin
  • 38,876
  • 13
  • 98
  • 128
  • The URLs are saved in a file. So my command will be: grep'\.com$' source.text >dest.tex | sed 's/^[^\.]\+\.//' ?? It gives me error ?? – Jury A Jul 26 '12 at 17:32
  • I also need to write the names (they are many lines not one) in another text file after removing www. – Jury A Jul 26 '12 at 17:37
  • I tried to guess what's your task and wrote an example of a `sed` regex to edit domain names in the document, without touching the rest of the lines. If your problem is different you may need a different regex, but overall the idea is the same. – sastanin Jul 27 '12 at 12:33
  • Normally you either redirect to file (`> dest.tex`), or just use pipe (`| sed ...`), but not both. You don't need `grep` if you want to change some lines but keep the rest. A carefully written regex and `sed` is probably all you need. – sastanin Jul 27 '12 at 12:34
  • On macOS, the `sed` command does not work the same as the Linux version. But you could use this simpler version on the Mac, without regular expressions: `echo www.example.com | sed "s/www.//"` -- It will replace `"www."` with empty string `""`. – Mr-IDE Oct 16 '19 at 10:38
7

As the others have noted, grep is not well suited for this task, sed is a good option, or if the text is well ordered a simple cut might be easier to type:

echo www.abc.com | cut -d. -f2-
  • -d. tells cut to use . as a delimiter.
  • -f2- tells cut to return field 2 to infinity.
Thor
  • 42,211
  • 10
  • 116
  • 125
7

You can do this using grep easily:

$ echo www.google.com | grep -o '[^.]*\.com'
google.com

Instead of echo you must give your file.

$ grep -o '[^.]*\.com$' < file

I used here the regular expression '[^.]*.com'. That means: find me a word without . in it ([^.]*), after which goes .com (\.com in re). The -o key says that grep must show only that part that was found.

Igor Chubin
  • 57,130
  • 8
  • 114
  • 135
4

with grep's --only-matching and \K

You can do this with a grep's --only-matching flag:

echo "www.abc.com" | grep --perl-regexp --only-matching 'www.\K.*'

which can be shortened to

echo "www.abc.com" | grep -Po 'www.\K.*'

Both commands produce

abc.com

with grep (GNU grep) 3.3.

Instead of echo, I'll use a here string to shorten the command further:

grep -Po 'www.\K.*' <<< "www.abc.com"

\K resets the starting point of the match, essentially forgetting the matched "www.". See this for more on \K.

with grep's positive lookbehind

You can also do this with a positive lookbehind:

grep -Po '(?<=www.).*' <<< "www.abc.com"

with awk's field separator -F

awk -F 'www.' <<< "www.abc.com" '$2{print $2}'

This prints

abc.com

The $2{print $2} part will print the second field if it's defined. This is necessary in case of multi-line input to avoid outputting blank lines for input lines that don't contain the field separator.

Matthias Braun
  • 28,341
  • 18
  • 134
  • 157
3

grep is not used to manipulate/change text, only to search for text/patterns within text

You should look into something like sed or awk or cut if you want a command line tool to do it. Or write a script in Python/Perl/Ruby/whatever.

Daniel DiPaolo
  • 53,439
  • 13
  • 112
  • 113
1

You can actually do this without invoking other programs, by using a builtin parameter expansion in bash:

while read line; do echo ${line#*.}; done < file

Where #*. tells the shell to remove the prefix that looks like 0 or more characters followed by a ..

You can view a cheatsheet with the different parameter expansions for bash here:

https://devhints.io/bash

clemens
  • 15,334
  • 11
  • 41
  • 58
Fahd Ahmed
  • 11
  • 1