0

I am playing around with binary data.

I have test data in columns in the following manner:

       A   B   C   D   E   F   G   H   I   J   K   L   M   N
       -----------------------------------------------------
       1   1   1   1   1   1   1   1   1   0   0   0   0   0
       0   0   0   0   1   1   1   0   1   1   0   0   1   0

1 is indicating that the system was on and 0 indicating that the system was off.

I have a way to figure out a way to summarize the gaps between the on/off transition of these systems.

For example:

  • for the first row, it stops working after I

  • for the second row, it works from E to G and then works again in I and M but is off during others.

I see my result in the following form (table1)

    row-number   value      grp_num     num       Range
    ------------  -----     --------    ------    ------ 
    1              1           1          9         A-I
    1              0           2          5         J-N
    2              0           1          4         A-D
    2              1           2          3         E-G
    2              0           3          1         H-H
    2              1           4          2         I-J
    2              0           5          2         K-L
    2              1           6          1         M-M
    2              0           7          1         N-N

The code I used is this:

table1 <- test[,-c(1)] %>% 
  rownames_to_column() %>%
  gather(col,val,-rowname) %>%
  group_by(rowname) %>%
  mutate(grp_num = cumsum(val != lag(val, default = -99))) %>%
  group_by(rowname,val,grp_num) %>%
  dplyr::summarise(num = n(),
                   range = paste0(first(col), "-", last(col)))

My question here is if my data had blank entries, how can I exclude them from being a part of a group.

A   B   C   D   E   F   G   H   I   J   K   L   M   N
-----------------------------------------------------
    1   1   1   1   1   1   1   1   0   0   0   0   0
                1   1   1   0   1   1   0   0   1   0

The expected result is very similar but excluding the blank values

         row-number   value      grp_num     num       Range
        ------------  -----     --------    ------    ------ 
        1              1           1          8         B-I
        1              0           2          5         J-N
        2              1           1          3         E-G
        2              0           2          1         H-H
        2              1           3          2         I-J
        2              0           4          2         K-L
        2              1           5          1         M-M
        2              0           6          1         N-N

M--
  • 20,766
  • 7
  • 52
  • 87
Jeet
  • 148
  • 11
  • 1
    Why `I-N`? `I` is on but you are including it in the off time. In the second row though, you have `H-H` and not including previous columns in your off time. Your output is inconsistent. – M-- Oct 25 '19 at 21:52
  • @M-- changes made. Thanks – Jeet Oct 25 '19 at 22:05
  • You should provide a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). I am not sure how exactly you have blanks in your data. If you share `dput(head(df, 2))` with us as an [edit] to your question, I will try to see if I can be any of help. p.s. Sorry for delayed response to your update and not including these points in my first comment. Cheers. – M-- Oct 27 '19 at 18:41

0 Answers0