How can I remove the beginning of a word using grep ? Ex: I have a file that contains:
www.abc.com
I only need the part
abc.com
Sorry for the basic question. But have no experience with Linux.
How can I remove the beginning of a word using grep ? Ex: I have a file that contains:
www.abc.com
I only need the part
abc.com
Sorry for the basic question. But have no experience with Linux.
You don't edit strings with grep in Unix shell, grep is usually used to find or remove some lines from the text. You'd rather use sed instead:
$ echo www.example.com | sed 's/^[^\.]\+\.//'
example.com
You'll need to learn regular expressions to use it effectively.
Sed can also edit file in-place (modify the file), if you pass -i argument, but be careful, you can easily lose data if you write the wrong sed command and use -i flag.
From your comments guess you have a TeX document, and your want to remove the first part of all .com domain names. If it is your document test.tex:
\documentclass{article}
\begin{document}
www.example.com
example.com www.another.domain.com
\end{document}
then you can transform it with this sed command (redirect output to file or edit in-place with -i):
$ sed 's/\([a-z0-9-]\+\.\)\(\([a-z0-9-]\+\.\)\+com\)/\2/gi' test.tex
\documentclass{article}
\begin{document}
example.com
example.com another.domain.com
\end{document}
Please note that:
[a-z0-9-]\+\.\( and \)) to indicate the first and the second part of the URL, and I replace the entire match with its second group (\2 in the substitution pattern)\+ repition means at least one match)i flag in the end)g flag in the end)As the others have noted, grep is not well suited for this task, sed is a good option, or if the text is well ordered a simple cut might be easier to type:
echo www.abc.com | cut -d. -f2-
-d. tells cut to use . as a delimiter.-f2- tells cut to return field 2 to infinity.You can do this using grep easily:
$ echo www.google.com | grep -o '[^.]*\.com'
google.com
Instead of echo you must give your file.
$ grep -o '[^.]*\.com$' < file
I used here the regular expression '[^.]*.com'. That means: find me a word without . in it ([^.]*), after which goes .com (\.com in re). The -o key says that grep must show only that part that was found.
--only-matching and \KYou can do this with a grep's --only-matching flag:
echo "www.abc.com" | grep --perl-regexp --only-matching 'www.\K.*'
which can be shortened to
echo "www.abc.com" | grep -Po 'www.\K.*'
Both commands produce
abc.com
with grep (GNU grep) 3.3.
Instead of echo, I'll use a here string to shorten the command further:
grep -Po 'www.\K.*' <<< "www.abc.com"
\K resets the starting point of the match, essentially forgetting the matched "www.". See this for more on \K.
You can also do this with a positive lookbehind:
grep -Po '(?<=www.).*' <<< "www.abc.com"
-Fawk -F 'www.' <<< "www.abc.com" '$2{print $2}'
This prints
abc.com
The $2{print $2} part will print the second field if it's defined. This is necessary in case of multi-line input to avoid outputting blank lines for input lines that don't contain the field separator.
grep is not used to manipulate/change text, only to search for text/patterns within text
You should look into something like sed or awk or cut if you want a command line tool to do it. Or write a script in Python/Perl/Ruby/whatever.
You can actually do this without invoking other programs, by using a builtin parameter expansion in bash:
while read line; do echo ${line#*.}; done < file
Where #*. tells the shell to remove the prefix that looks like 0 or more characters followed by a ..
You can view a cheatsheet with the different parameter expansions for bash here: