1

In my masters program I am trying to implement a decission tree. Therefore I at some point have a vector of sorted and unique values of all variable. e.g.

sorted_unique <- c(1, 3, 5, 7)

now in the next step I am looking for all splitting points - I want to obtain the mean value between all values in the original vector.

splits <- double(length(sorted_unique) - 1)

for (i in 1:length(splits)) {
  splits[i] <- mean(sorted_unique[i:(i+1)])
}

this indeed yields the desired

> splits
[1] 2 4 6

however since I have to use this procedure a lot of times, it is very interesting to me, if there is a more efficient way to implement this.

Kind regards

Duesser
  • 29
  • 6

2 Answers2

1

One option could be:

sapply(seq_along(sorted_unique), function(x) mean(sorted_unique[c(x, x + 1)]))[-length(sorted_unique)]

[1] 2 4 6
tmfmnk
  • 36,341
  • 4
  • 40
  • 53
0

Taking into account this question:

how can i efficiently obtain a vector with values that are between the original vectors values?

And taking into account that you have (as a starting point) sorted vector of unique values, you can try this:

sorted_unique <- c(1, 3, 5, 7)
all_values <- sorted_unique[[1]]:sorted_unique[[length(sorted_unique)]]
between <- all_values[!all_values %in% sorted_unique]
gss
  • 1,108
  • 4
  • 10
  • This does not yield the desired result. I would like to get only one split point. If I input `sorted_unique – Duesser Nov 21 '21 at 11:51