unix tr find and replace

Question

This is the command I'm using on a standard web page I wget from a web site.

tr '<' '\n<' < index.html

however it giving me newlines, but not adding the left broket in again. e.g.

 echo "<hello><world>" | tr '<' '\n<'

returns

 (blank line which is fine)
 hello>
 world>

instead of

 (blank line or not)
 <hello>
 <world>

What's wrong?

ephemient · Answer 1 · 2011-12-01T23:46:29.123

28

That's because tr only does character-for-character substitution (or deletion).

Try sed instead.

echo '<hello><world>' | sed -e 's/</\n&/g'

Or awk.

echo '<hello><world>' | awk '{gsub(/</,"\n<",$0)}1'

Or perl.

echo '<hello><world>' | perl -pe 's/</\n</g'

Or ruby.

echo '<hello><world>' | ruby -pe '$_.gsub!(/</,"\n<")'

Or python.

echo '<hello><world>' \
| python -c 'for l in __import__("fileinput").input():print l.replace("<","\n<")'

edited Dec 01 '11 at 23:46

answered Dec 01 '11 at 23:23

ephemient

189,938
36
271
385

I tried that but I get nn. I don't know what the sed newline character is – Kamran224 Dec 01 '11 at 23:26
@Kamran224 This works for me but try: echo -e '' | sed -e 's/\n&/g' – Dec 01 '11 at 23:29
@Kamran224 `\n` is a GNU sed extension. What system are you on? – ephemient Dec 01 '11 at 23:36
1

@ephemient SunOS (afs system on my campus) – Kamran224 Dec 01 '11 at 23:43
On SunOS you will have to put the new line manually. In substitution field, hit `enter` and continue with your replacement stuff. For `tab` you will have to manually hit spaces (8 times) or whatever is the default `tab` limit on your machine. – jaypal singh Dec 01 '11 at 23:48
@ephemient You didnt give `c++` implementation of it!! :P +1 – jaypal singh Dec 01 '11 at 23:50
2

@Jaypal A string of 8 spaces does not equal a tab; you need a literal tab character. The 8-space thing is about tab stops, not tabs. – Michael J. Barber Dec 04 '11 at 07:27
1

Use `perl` when you are on an unspecified Unix machine. Using `sed` or `tr` on those machines can reveal they don't support expected features. – Yuri Mar 29 '19 at 09:43

score 3 · Answer 2 · edited May 23 '17 at 11:54

3

If you have GNU grep, this may work for you:

grep -Po '<.*?>[^<]*' index.html

which should pass through all of the HTML, but each tag should start at the beginning of the line with possible non-tag text following on the same line.

If you want nothing but tags:

grep -Po '<.*?>' index.html

You should know, however, that it's not a good idea to parse HTML with regexes.

edited May 23 '17 at 11:54

Community

1
1

answered Dec 04 '11 at 06:30

Dennis Williamson

324,833
88
366
429

score 2 · Answer 3 · edited Nov 25 '20 at 18:50

2

The order of where you put your newline is important. Also you can escape the "<".

`tr '<' '<\n' < index.html`

works as well.

edited Nov 25 '20 at 18:50

blizz

3,959
5
32
55

answered Oct 03 '13 at 21:27

felix747

21
1

jaypal singh · Answer 4 · 2011-12-02T00:24:21.343

2

Does this work for you?

awk -F"><" -v OFS=">\n<" '{print $1,$2}'

[jaypal:~/Temp] echo "<hello><world>" | awk -F"><" -v OFS=">\n<" '{$1=$1}1';
<hello>
<world>

You can put a regex / / (lines you want this to happen for) in front of the awk {} action.

edited Dec 02 '11 at 00:24

answered Dec 01 '11 at 23:38

jaypal singh

71,025
22
98
142

1

`'{$1=$1}1'` is shorter and will work if there is more than `> – ephemient Dec 02 '11 at 00:10
This would replace fewer of the ` – Michael J. Barber Dec 04 '11 at 07:29

unix tr find and replace

4 Answers4

Linked

Related