0
>str(data$Installs)

$ Installs : Factor w/ 21 levels "","0+","1+","1,000+",..: 8 20 15 18 11 17 17 5 5 8 ...

 db$Installs = as.character(gsub("\\+", "", db$Installs))

 str(db$Installs)
  chr [1:10841] "10,000" "500,000" "5,000,000" "50,000,000" "100,000" "50,000" "50,000" "1,000,000" "1,000,000" "10,000" ...

 db$Installs = as.double(gsub(",","",db$Installs))

 str(db$Installs)
  num [1:10841] 1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ...

I want variables like this:

"10000" "500000" "5000000" "50000000" "100000" "50000" "50000" "1000000" "1000000" "10000" ...

I tried this code


db$Installs.factor <- factor(db$Installs) 
db$Installs = as.character(gsub("\\+", "", db$Installs))
db$Installs = as.double(gsub(",","",db$Installs))

Rui Barradas
  • 57,195
  • 8
  • 29
  • 57
Adarsh Pawar
  • 528
  • 4
  • 10

1 Answers1

1

Try this

Input-

sample <- c("10,000+" ,"500,000+", "5,000,000+", "50,000,000+" ,"100,000+", "50,000+" ,"50,000+" ,"1,000,000+" )

Solution-

sample <- as.numeric(gsub("\\D", "", sample))

Output-

1]    10000   500000  5000000 50000000   100000    50000    50000  1000000

Note- If you want to force R not to use exponential notation, then you can use -

options("scipen"=100, "digits"=4)

scipen’: integer. A penalty to be applied when deciding to print numeric values in fixed or exponential notation. Positive values bias towards fixed and negative towards scientific notation: fixed notation will be preferred unless it is more than ‘scipen’ digits wider.

Rushabh Patel
  • 2,572
  • 11
  • 31
  • 1
    The OP may still find things being printed in scientific notation, which is a separate issue, for which they might want to look [here](https://stackoverflow.com/q/9397664/324364). – joran Apr 10 '19 at 20:03
  • `> db str(db$Installs)` Factor w/ 21 levels "","0+","1+","1,000+",..: 8 20 15 18 11 17 17 5 5 8 ... `> db$Installs = as.numeric(gsub("\\D", "", db$Installs))` `> str(db$Installs)` num [1:10841] 1e+04 5e+05 5e+06 5e+07 1e+05 5e+04 5e+04 1e+06 1e+06 1e+04 ... – Adarsh Pawar Apr 10 '19 at 20:19
  • It is converting to numeric using above solution, now you need to force r to avoid exponential notation by using link provided by @joran or using `options("scipen"=100, "digits"=4)` – Rushabh Patel Apr 10 '19 at 20:22
  • yes! Done Thanks.....`options("scipen"=100, "digits"=4)` It worked. – Adarsh Pawar Apr 10 '19 at 20:25