0

I found the text of certain cell entries in my dataframe to be garbled and would like to replace them with string, but R returns the following message

#load data from dropbox
library(foreign)
data <- read.csv("https://www.dropbox.com/s/anm8xrovxc5xtr5/comtrade2009.csv?dl=1")
unique(data$ptTitle)[75]
[1] <NA>
#this is not an NA because the text on the CSV file appears to be some garbled string due to encoding, 
#it shows "C<U+00F4>te d'Ivoire"

data$ptTitle[data$ptTitle == <NA>] <- "Cote d'Ivoire"
Warning message:
In `[<-.factor`(`*tmp*`, ct2009$ptTitle == "<NA>", value = c(238L,  :
  invalid factor level, NA generated

it does not allow me to replace those garbled character values with character string, does anyone know how to overwrite those garbled characters with my preferred character string?

Update

So I guess a better way to work around this is to add stringsAsFactors=F when loading csv file using read.csv, so it's much easier to replace cell values with NA (instead of <NA>). Sorry for all the hassles this thread might have caused.

Chris T.
  • 1,473
  • 5
  • 20
  • 37
  • 1
    To capture NA in R, `is.na(data$ptTitle)` – Sotos Nov 09 '18 at 12:33
  • `` means it's a factor, and the thread marked by user zx8754 did not actually answer my question. – Chris T. Nov 09 '18 at 12:50
  • I guess a better alternative to work around this is to add `stringsAsFactors=F` when loading the data using `read.csv`, so it's easier to replace those `NA` (instead of ``). – Chris T. Nov 09 '18 at 12:55
  • There is one NA row in data, row 94. Linked post solution works. Try: `data$ptTitle – zx8754 Nov 09 '18 at 13:12
  • The strange thing is that after all these replacement tricks, that cell still maintains its garbled format `Cte d'Ivoire`. – Chris T. Nov 09 '18 at 13:20
  • Some updates here, after I applied your code `data$ptTitle te d'Ivoire"] – Chris T. Nov 09 '18 at 13:38

0 Answers0