0

How do I replace single zeros (0) with string NA in a tab-separated-values file?

Suppose I have the table:

0\t0.15\t0t\8.05\t0\t0\t0.15\7.0306\n
5\t0.18\t0\8.05\t0\t0\t0.5t\50\n
1\t15\t0205\t0\t0.16\t200t\40.90\n 

I would like to get:

NA\t0.15\NAt\8.05\tNA\tNA\t0.15t\7.0306\n
5\t0.18\tNA\8.05\tNA\tNA\t0.5t\50\n
1\t15\t0205\tNA\t0.16\t200t\40.90\n 

That is, I would like to match the null measures of the data frame.

mklement0
  • 312,089
  • 56
  • 508
  • 622
fred
  • 9,195
  • 1
  • 23
  • 33

3 Answers3

4

awk enables a robust, portable solution:

awk 'BEGIN {FS=OFS="\t"} {for (i=1; i<=NF; ++i) { if ($i=="0") {$i="NA"} }; print}' file
  • BEGIN {FS=OFS="\t"} tells awk - before input processing begins (BEGIN) - to split input lines into fields by tab characters (FS="\t") and to also separate them by tab characters on output (OFS="\t").

    • Reserved variable FS is the [input] field separator; OFS is the output field separator.
  • for (i=1; i<=NF; ++i) loops over all input fields (NF is the count of input fields), resulting from splitting each input line by tabs.

    • if ($i=="0") {$i="NA"} tests each field for being identical to string 0 and, if so, replaces that field ($i) with string NA.

    • On assigning to a field, the input line at hand is implicitly rebuilt from the (modified) field values, using the value of OFS as the separator.

  • print simply prints the (potentially modified) input line at hand.

mklement0
  • 312,089
  • 56
  • 508
  • 622
0

With GNU sed:

sed -E ':a;s/(\t)*\b0\b(\t)/\1NA\2/g;ta;' file

Using backreference, this replace 0 eventually preceded of followed by a tab(\t) with NA and captured tab.

Graham
  • 7,035
  • 17
  • 57
  • 82
SLePort
  • 14,775
  • 3
  • 30
  • 41
0

With GNU or OSX sed for -E for EREs:

$ sed -E 's/(^|\t)0(\t|$)/\1NA\2/g; s/(^|\t)0(\t|$)/\1NA\2/g' file
NA      0.15    NA      8.05    NA      NA      0.15    7.0306
5       0.18    NA      8.05    NA      NA      0.5     50
1       15      NA      205     NA      0.16    200     40.90

See https://stackoverflow.com/a/44908420/1745001 for why it takes 2 passes.

Ed Morton
  • 172,331
  • 17
  • 70
  • 167