I have a data set looking sth like this ....
Date Remaining Volume ID
1990-01-01 0 1000 1
1990-01-01 1 2000 2
1990-01-01 1 5000 3
1990-02-01 0 200 4
1990-03-01 1 4000 5
1990-03-01 0 3000 6
I filter the data according to a series of conditional statements and assign the binary flag variable to the data.table. A value of 0 means that the particular row entry doesn't meet the defined requirements and will subsequently be excluded; 1-flagged rows remain in the data.table. The key is ID and is unique for each row.
I would like to show two relationships.
(1) A stacked normalized/percentage bar chart over the monthly time series to show the percentage of entries remaining/being excluded in the data.set for each month,
f.ex. Jan 1990 --> 2/3 values remaining --> 66.6% vs. 33.3% of entries remain vs. are excluded
(2) A stacked normalized/percentage bar chart showing the normalized percentage of volume remaining/ being excluded by the filtering operation for each month,
f.ex. Jan 1990 --> 2k + 5k out of 8k remaining --> 87.5% vs. 12.5% of volume remains vs. is excluded
I tried various things so far, f.ex. compute the number of occurences of each flag-value per month and the sum of the corresponding "bucket" (0/1) volume, but all my attempts failed so far.
# dt_1 is the original data.table
id.vec <- dt_1[ , id]
dt_2 <- dt_1
# dt_1 is filterd subsequently
id_remaining.vec <- dt_1[ , id]
dt_2 <- dt_2[id.vec %in% id_remaining.vec, REMAIN := 1]
dt_2 <- dt_2[id.vec %notin% id_remaining.vec, REMAIN := 0]
dt_2 <- dt_2[REMAIN == 1 , N_REMAIN := .N]
dt_2 <- dt_2[REMAIN == 1 , N_REMAIN_MON := .N]
# Tried the code below to no avail
ggplot(data = dt_2, aes(x = Date, y = REMAIN, color = REMAIN, fill = REMAIN)) +
geom_bar(position = "fill", stat = "identity")
Usually, I find ggplot grammar very intuitive, but I guess I am overlooking sth here or maybe the data set is not in the right format.
Any pointer or idea highly appreciated!