0

I am using the R programming language. I am trying to figure out how to "recreate" plots in ggplot2/plotly, once they have been created in base R.

For example, I created some data and made a plot :

library(Rtsne)
library(cluster)
library(ggplot2)
library(dplyr)
library(dbscan)
library(plotly)


#generate data

var_1 <- rnorm(100,1,4)
var_2<-rnorm(100,10,5)
var_3 <- sample( LETTERS[1:4], 100, replace=TRUE, prob=c(0.1, 0.2, 0.65, 0.05) )
response_variable <- sample( LETTERS[1:2], 100, replace=TRUE, prob=c(0.4, 0.6) )


#put them into a data frame called "f"
f <- data.frame(var_1, var_2, var_3, response_variable)

#declare var_3 and response_variable as factors
f$response_variable = as.factor(f$response_variable)
f$var_3 = as.factor(f$var_3)

#create id
f$ID <- seq_along(f[,1])

#gower distance

gower_dist <- daisy(f[, -c(4,5)],
                    metric = "gower")

#lof plot

lof <- lof(gower_dist, k=3)


plot(gower_dist, pch = ".", main = "LOF (k=3)")
points(gower_dist, cex = (lof-1)*3, pch = 1, col="red")
text(gower_dist[lof>2,], labels = round(lof, 1)[lof>2], pos = 3)

Here is a picture of the plot:

enter image description here

Now, I am trying to recreate a similar plot in ggplot2 and plotly. I ran a statistical algorithm called tsne :

# tsne


tsne_obj <- Rtsne(gower_dist,  is_distance = TRUE)

tsne_data <- tsne_obj$Y %>%
    data.frame() %>%
    setNames(c("X", "Y")) %>%
    mutate(
           name = f$ID)

I want the axis of the new graph to be : (tsne_data$X , tsne_data$Y).

I am not sure if it is still possible to recreate the previous plot in ggplot2 using the red circles (the radius of the red circles are proportionate to the "lof" score). Is it still possible to make a similar plot in ggplot2 as I did before? Is it possible to change the size of the points in ggplot2 to be proportional to the "lof" values?

I tried the following:

plot = ggplot(aes(x = X, y = Y), data = tsne_data) + geom_point(aes())

and got something like this:

enter image description here

Is it possible to change the size of these points based on the values of "lof" (e.g.

summary(lof)
hist(lof, breaks=10

)

Is it possible so that when you drag the mouse of a plotly rendition of the ggplot2, it displays the lof score and f$ID?

plotly_plot = ggplotly(plot)

Thanks!

Sources: https://www.rdocumentation.org/packages/dbscan/versions/1.1-5/topics/lof https://dpmartin42.github.io/posts/r/cluster-mixed-types

stats_noob
  • 3,127
  • 2
  • 8
  • 27

1 Answers1

1

You can vary the point size based on lof. The tooltip in the ggplotly graph can also be adjusted to show lof and name.

Edit: Added var1, var2 and var3 to the tooltip

tsne_obj <- Rtsne(gower_dist,  is_distance = TRUE)

tsne_data <- tsne_obj$Y %>%
  data.frame() %>%
  setNames(c("X", "Y")) %>%
  mutate(
    name = f$ID, 
    lof=lof,
    var1=f$var_1,
    var2=f$var_2,
    var3=f$var_3
    )

p1 <- ggplot(aes(x = X, y = Y, size=lof, key=name, var1=var1, 
  var2=var2, var3=var3), data = tsne_data) + 
  geom_point(shape=1, col="red")+
  theme_minimal()
p1

ggplotly(p1, tooltip = c("lof", "name", "var1", "var2", "var3"))

user12728748
  • 7,142
  • 2
  • 5
  • 9
  • this is perfect! thank you so much! I am trying to follow this stackoverflow post over here : https://stackoverflow.com/questions/36325154/how-to-choose-variable-to-display-in-tooltip-when-using-ggplotly ... and add labels from the original file (f$var_1, f$var_2) when you move the mouse over each observation. I posted my code below, can you please take a look at it when you have time? – stats_noob Dec 23 '20 at 18:58
  • f$X = tsne_data$X; f$Y = tsne_data$Y; p1 – stats_noob Dec 23 '20 at 18:59
  • Error: Mapping should be created with `aes()` or `aes_()`. – stats_noob Dec 23 '20 at 18:59
  • (I added semicolons ; to show a break where each new line of code starts) – stats_noob Dec 23 '20 at 19:00
  • 1
    You have to define var1, var2 and var3 in your `ggplot` object to be able to call them. See my latest edit. – user12728748 Dec 23 '20 at 19:50
  • thank you - these edits solve the problem! – stats_noob Dec 23 '20 at 20:19
  • is there any chance you can please take a look at this question? https://stackoverflow.com/questions/65434105/r-formatting-plotly-hover-text?noredirect=1#comment115685527_65434105 thank you! – stats_noob Dec 25 '20 at 20:09