1

I have a text file containing several languages, how to read in R use read.delim function,

Encoding("file.tsv")
#[1] "unknown"

source_data = read.delim(file, header= F, fileEncoding= "windows-1252",
               sep = "\t", quote = "")
source_D[360]
#[1] "ð¿ð¾ð¸ñðº ð½ð° ññ‚ð¾ð¼ ñð°ð¹ñ‚ðµ"

But the source_D[360] showed in Notepad is 'поиск на этом сайте'

Ronak Shah
  • 355,584
  • 18
  • 123
  • 178
Fiona_Wang
  • 123
  • 1
  • 2
  • 12

2 Answers2

3

tidyverse approach:

use the option locale in read_delim. (readr functions have _ instead of . and are usually faster and smarter to read) more details here: https://r4ds.had.co.nz/data-import.html#parsing-a-vector

source_data = read_delim(file, header= F, 
                         locale = locale(encoding = "windows-1252"),
                         sep = "\t", quote = "")
Viviane
  • 51
  • 5
0
source_data = read.delim(file, header = F, sep = "\t", quote = "", stringsAsFactors = FALSE)
Encoding(source_data)= "UTF-8"

I have tried, If you run you R in windows, above code works for me. and if you run R in Unix, you could use following code

source_data = read.delim(file, header = F, fileEncoding="UTF-8", sep = "\t", quote = "", stringsAsFactors = FALSE)
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178
Fiona_Wang
  • 123
  • 1
  • 2
  • 12