About the edges
We can reproduce the image with the code below that obtains data from the European Centre for Disease Prevention and Control (ECDC) (they even provide a script on their website to download it directly into R).

I always see two issues near the edges.
When I am using a rolling average then I often use the Savitzky Golay filter (which is a bit more general). For a filter of length 7 it uses the values $$x_{k-3},x_{k-2},x_{k-1},x_{k}, x_{k+1}, x_{k+2}, x_{k+3}$$
Indeed this is problematic at the edges because the filter has not data beyond the edges while the filter requires this.
You can deal with this in two ways.
- You cut-off the values at the ends
- You extrapolate the filter at the end. In the image below you see this happening. Near the end the smooth curve changes in a straight line. The Savitzky Golay algorithm in R is simply taking the values of the filter as fixed for the last 4 points. I have plotted two different filters: The one is zero-order and based on averaging (which gives a flat line), the other is based on a linear curve fit (which is effectively also a moving average if the points are equally spaced, but near the end-points, you see that it is different in the extrapolation, it follows a line at an angle)
The data in the last period might not be up to date. This might give a false idea about the trend. The data shows the reported cases and not truly actual cases. In the last week or two weeks some data might be incomplete which is not the case for the other weeks
(Actually, besides this reporting effect there is much more going on with these covid-19 reports so I am not saying here that by solving this issue all problems with interpretation of the trend are gone. The data from March and April are not equally comparable to data from October November).
Some comparable effect is seen in data about posts on StackExchange. The below image is from a meta post (We have a very large & widening gap between questions and answers. How do we fix it?), which shows the ratio of answered questions as a function of the score of the question (each curve is a different score) and the date (x-axis). In the last year you see some drop in the yellow curve. This is not because there is a sudden change in the answering rate of such questions that occurred in the last year. But instead, it is because after one year the system removes/deletes unanswered questions of low score.

I considered this second point to be more problematic than the first point. You can see that the first effect has only a very tiny effect for the last three days and is barely visible.
By the way, on a logarithmic scale it is often more easy to interpret these types of growth curves (which are multiplicative in the way how they change). Then you would get the following graph:

About the simplistic comparison
Also note that these comparisons on a small timescale are not very meaningful. You can see that the curve goes up and down a lot on short time scales (mostly due to differences in reporting during the weekends) and you may have an occasional dip or peak. The curve in the image from the Newyork Times has a very sharp bend. This might be due to some extreme smoothing (Maybe they used a higher-order Savitzky Golay filter) or possibly they picked up a few days with lower reporting.
The media are very eager to report on these short timescale trends, place them out of context, and blow up the story. They make money by doing that.
Edward Tufte has a good example on pages 74-75 of his book "The Visual Display of Quantitative Information". It is about traffic deaths in Connecticut compared for just two years or on a longer stretch. You can see it being discussed on his blog
Code
#these libraries need to be loaded
library(utils)
#read the Dataset sheet into “R”. The dataset will be called "data".
data <- read.csv("https://opendata.ecdc.europa.eu/covid19/casedistribution/csv",
na.strings = "", fileEncoding = "UTF-8-BOM", stringsAsFactors = 0)
countries <- c('Austria','Belgium','Bulgaria','Croatia','Cyprus','Czechia','Denmark',
'Estonia','Finland','Germany','Greece','Hungary','Iceland','Ireland',
'Italy','Latvia','Liechtenstein','Luxembourg','Malta','Netherlands',
'Norway','Poland','Portugal','Romania','Slovakia','Slovenia','Spain',
'Sweden','United_Kingdom')
#View(data[data$countriesAndTerritories == "Croatia",])
combining all the cases for the European countries
lc <- length(countries)
M <- c() ### this will be a matrix with the cases from each country in each column
first_day <- c() ### this will be a vector with the first day for each country (to verify the data) (the first day is actually the last day)
length <- c() ### this will be a vector with the lenght (to verify the data)
for (i in 1:lc) {
info about the date
fd <- data$dateRep[data$countriesAndTerritories == countries[i]][1]
l <- length(data$dateRep[data$countriesAndTerritories == countries[i]])
first_day <- c(first_day, fd)
length <- c(length,l)
}
in this example executed on 19/11/2020 we have Spain missing a day
so we shift them by one and add a 0
pre <- rep(0,lc)
pre[lc-2] <- 1 #spain
we add zeros for the vectors that are shorter
post <- max(length)-length - pre
for (i in 1:lc) {
extract the data for each country
m <- data$cases[data$countriesAndTerritories == countries[i]]
mcorr <- c(rep(0,pre[i]),m, rep(0,post[i]))
M <- cbind(M,mcorr)
}
M <- M[-1,] ### remove row because Spain has a zero here
cases <- rev(rowSums(M))
extract points for plotting dates on axis
dates <- data$dateRep[data$countriesAndTerritories == countries[1]]
dates <- rev(dates[1:max(length)][-1]) ### cut to the length of the data and remove 1
date_x <- which(dates %in% c("01/03/2020","01/04/2020","01/05/2020","01/06/2020","01/07/2020",
"01/08/2020","01/09/2020","01/10/2020","01/11/2020"))
labels <- c("Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov")
extract days since first day with 10 cases
first_10 <- min(which(cases >= 10))
axis_length <- floor((max(length-1)-first_10)/20)
f10labs <- c(0:axis_length)*20
f10labs[c(0:axis_length) %% 5 != 0] = ""
plotting
par(mar = c(8,5,1,1), mgp = c(3,1,0))
plot(first_10:(max(length)-1), cases[first_10:(max(length)-1)],
xlab = "", ylab = "detected/reported cases", xaxt = "n", yaxt = "n", type = "l" ,
xlim = c(first_10, max(length)), bty = "n", col = 8)
axis(2, at = c(0,50,100,150,200)1000, labels = c("0", "50k", "100k", "150k", "200k"), las = 2)
axis(1, at = date_x, labels = rep("", length(date_x)))
axis(1, at = date_x+15, labels = labels , tck = 0)
axis(1, at = first_10 + c(0:axis_length)20, labels = f10labs, line = 3)
axis(1, at = first_10+0.5*(max(length)-first_10),
labels = c("days since first day with 10 cases") , tck = 0, line = 4.5)
rolling1 <- signal::sgolayfilt(cases, p=0, n = 7)
lines(1:length(cases), rolling1)
rolling1 <- signal::sgolayfilt(cases, p=1, n = 7)
lines(1:length(cases), rolling1, lty = 2)