0

Create document-term matrix

dtm <- DocumentTermMatrix(docs, control = params)

Error in nchar(rownames(m)) : invalid multibyte string, element 1

Anyone who knows how to tackle this error? Working in Rstudio

Allan Cameron
  • 91,771
  • 6
  • 28
  • 55
  • 1
    Please read how to create a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) – Conor Neilson Mar 29 '20 at 03:58

2 Answers2

2

This happens when your input text isn't UTF-8 encoded. You can read about character encoding here.

Another good reference is this

I've found that the best way to handle these issues is to use stringr::str_conv.

mydocs <- c("doc1", "doc2", "doc3")

stringr::str_conv(mydocs, "UTF-8")

Where you have non-UTF-8 characters, you'll get a warning, but the character vector that comes out the other side will be usable.

Do that to your docs vector before calling `DocumentTermMatrix.

Tommy Jones
  • 380
  • 2
  • 10
1
Sys.setlocale( 'LC_ALL','C' ) 

In R studio apply this code .. It will refresh the locale .. worked for me many times.

  • 1
    Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Feb 27 '22 at 01:17
  • I despise it because of this error, but it works for me. Thank you very much. – akunyer Apr 27 '22 at 20:20