4

I want to find the unique sequences in my vector. A sequence is a series of identical values. If a sequence repeats, it counts as two sequences, as long as there is another sequence in between. A sequence can have a length of one value.

So that if my function is called findSequences(), it would work like this:

my_vector = c('a', 'a', 'b', 'a', 'c', 'c', 'b')

find_Sequences(my_vector)

> 'a', 'b', 'a', 'c', 'b'

unique() and distinct() don't do this.

Uwe Keim
  • 38,279
  • 56
  • 171
  • 280
petyar
  • 461
  • 3
  • 10

4 Answers4

8

You can use rle.

rle(my_vector)$values
#[1] "a" "b" "a" "c" "b"
GKi
  • 27,870
  • 2
  • 18
  • 35
4

You can use comparisons with the preceding item:

my_vector[c(TRUE, my_vector[-1] != my_vector[-length(my_vector)])]

It should be better than rle as it is doing the same with less code.

Clemsang
  • 4,161
  • 2
  • 23
  • 38
2

You can use the run length encoding rle function:

rle(c('a', 'a', 'b', 'a', 'c', 'c', 'b'))
Run Length Encoding
  lengths: int [1:5] 2 1 1 2 1
  values : chr [1:5] "a" "b" "a" "c" "b"

The values field tells you what you need.

user2474226
  • 1,452
  • 1
  • 9
  • 8
2

We can also use data.table::rleid and duplicated to get unique sequences.

my_vector[!duplicated(data.table::rleid(my_vector))] 
#[1] "a" "b" "a" "c" "b"
Ronak Shah
  • 355,584
  • 18
  • 123
  • 178