52

I am trying to import a csv that is in Japanese. This code:

url <- 'http://www.mof.go.jp/international_policy/reference/itn_transactions_in_securities/week.csv'
x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE)

returns the following error:

Error in type.convert(data[[i]], as.is = as.is[i], dec = dec, na.strings = character(0L)) : 
invalid multibyte string at '<91>ΊO<8b>y<82>ёΓ<e0><8f>،<94><94><84><94><83><8c>_<96>̏@(<8f>T<8e><9f><81>E<8e>w<92><e8><95>@<8a>փx<81>[<83>X<81>j'

I tried changing the encoding (Encoding(url) <- 'UTF-8' and also to latin1) and tried removing the read.csv parameters, but received the same "invalid multibyte string" message in each case. Is there a different encoding that should be used, or is there some other problem?

bartektartanus
  • 14,101
  • 5
  • 73
  • 98
jaredwoodard
  • 717
  • 1
  • 6
  • 8

9 Answers9

90

Encoding sets the encoding of a character string. It doesn't set the encoding of the file represented by the character string, which is what you want.

This worked for me, after trying "UTF-8":

x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE, fileEncoding="latin1")

And you may want to skip the first 16 lines, and read in the headers separately. Either way, there's still quite a bit of cleaning up to do.

x <- read.csv(url, header=FALSE, stringsAsFactors=FALSE,
  fileEncoding="latin1", skip=16)
# get started with the clean-up
x[,1] <- gsub("\u0081|`", "", x[,1])    # get rid of odd characters
x[,-1] <- as.data.frame(lapply(x[,-1],  # convert to numbers
  function(d) type.convert(gsub(d, pattern=",", replace=""))))
Joshua Ulrich
  • 168,168
  • 29
  • 327
  • 408
  • Thanks. From [this question](http://stackoverflow.com/questions/11069908/r-extracting-clean-utf-8-text-from-a-web-page-scraped-with-rcurl) I tried setting the local to japanese with `Sys.setlocale` but that didn't work either ("OS reports request to set locale to "japanese" cannot be honored"). – jaredwoodard Jan 16 '13 at 17:06
  • Yes, read.csv("foobar.csv", fileEncoding = "latin1") worked for me. I had an Excel file and saved as CSV, then had to set the fileEncoding to "latin1" to read that CSV in R. – Dan Jarratt Apr 26 '17 at 19:17
  • @Joshua Ulrich, what if my code looks like this? `file.list – Rollo99 Nov 08 '19 at 10:18
15

You may have encountered this issue because of the incompatibility of system locale try setting the system locale with this code Sys.setlocale("LC_ALL", "C")

dpel
  • 1,696
  • 1
  • 19
  • 28
user3670684
  • 1,067
  • 9
  • 8
11

The readr package from the tidyverse universe might help.

You can set the encoding via the local argument of the read_csv() function by using the local() function and its encoding argument:

read_csv(file = "http://www.mof.go.jp/international_policy/reference/itn_transactions_in_securities/week.csv",
         skip = 14,
         local = locale(encoding = "latin1"))
Je Hsers
  • 136
  • 1
  • 4
1

I had the same error and tried all the above to no avail. The issue vanished when I upgraded from R 3.4.0 to 3.4.3, so if your R version is not up to date, update it!

stevec
  • 27,285
  • 13
  • 133
  • 181
1

The simplest solution I found for this issue without losing any data/special character (for example when using fileEncoding="latin1" characters like the Euro sign € will be lost) is to open the file first in a text editor like Sublime Text, and to "Save with encoding - UTF-8".

Then R can import the file with no issue and no character loss.

0

For those using Rattle with this issue Here is how I solved it:

  1. First make sure to quit rattle so your at the R command prompt
  2. > library (rattle) (if not done so already)
  3. > crv$csv.encoding="latin1"
  4. > rattle()
  5. You should now be able to carry on. ie, import your csv > Execute > Model > Execute etc.

That worked for me, hopefully that helps a weary traveller

wired00
  • 13,134
  • 7
  • 66
  • 68
0

I had a similar problem with scientific articles and found a good solution here: http://tm.r-forge.r-project.org/faq.html

By using the following line of code:

tm_map(yourCorpus, content_transformer(function(x) iconv(enc2utf8(x), sub = "byte")))

you convert the multibyte strings into hex code. I hope this helps.

Carlos
  • 49
  • 2
  • 11
0

If the file you are trying to import into R that was originally an Excel file. Make sure you open the original file and Save as a csv and that fixed this error for me when importing into R.

822_BA
  • 48
  • 6
0

I came across this error (invalid multibyte string 1) recently, but my problem was a bit different:

We had forgotten to save a csv.gz file with an extension, and tried to use read_csv() to read it. Adding the extension solved the problem.

Mirabilis
  • 413
  • 3
  • 6