rendering Burmese characters in R

Question

I am working with text in Burmese and am attempting to run a topic model in R. R seems to be having trouble displaying and rendering Burmese characters. When I set the data as a data.frame, the Burmese characters are rendered correctly:

data<-read.csv("data.csv", fileEncoding ="UTF8", encoding="UTF-8", stringsAsFactors=FALSE) 
filenames<-data[,2]
txts<-data[,5] 
docs <-data.frame(docs= txts,row.names=filenames)
ds <- DataframeSource(docs)
cases<-Corpus(ds)
cases[[1]]

လိုက်... #[the rest is a text file with properly rendered Burmese]

However, when the text is not from a data.frame or directly from the csv file, several characters:

data[1,5]

လိုက\u103a

The rest is a paragraph of text in which some accent marks are displayed incorrectly as in this example.

I have checked the encodings using Encoding() and R confirms that in both cases I am using UTF-8.

FYI, I use a Mac running R64. I have a colleague who uses a PC and did not encounter this issue, but we could not isolate the problem.

see http://stackoverflow.com/questions/12857021/how-to-display-burmese-characters-on-website-and-record-them-in-mysql-if-the-use — user2510479, Jul 29 '13 at 21:58
FYI, I already looked at question 17715956, which is a similar issue but not quite the same (I can't get characters in non-data-frame sources, opposite of that problem; also, that problem was on a PC) — dnardi, Jul 29 '13 at 22:10

rendering Burmese characters in R

0 Answers0