1

I have asked on SO about some implementation details of my problem. But I have realized that it is hard to ask (and search) if you do not know the name of the graph. Because this community is more about statistics I am asking here - how is the name of the following variant of a data-visualization.

I have an experiment, where for every location (1-100) a value of Y variable changes over time. I want to show this with a 2d graph, where X-axis correspond to the locations (1-100) and Y-axis correspond to the average value of Y for this location in the chosen time interval (e.g. for yesterday).

EDIT: I am talking about a line that is overlayed the heatmap. Every point in a line is average value of the corresponding column. Maybe, this is to trivial to have a name?

enter image description here

mkt
  • 18,245
  • 11
  • 73
  • 172
meolic
  • 113
  • It sounds like there would be only a single value over each x value (i.e., for each observation), so it would just be a plot of 100 points. But that probably is not what you want, in particular there is nothing "heatmap" about this, so I'm misunderstanding something. Can you clarify? Do you plan on plotting multiple dots over each x value, heatmap-colored to indicate what period they are from? – Stephan Kolassa Jul 23 '21 at 13:44
  • Word "heatmap" in the title seems to be misleading. I have used it because I want to show and avarage of all values in one section of a heatmap (all values that belong to the same location) - I am thinking about this as an overlay over heatmap. – meolic Jul 23 '21 at 14:12
  • A generic term is that the line is a "smooth" of the conditional distributions. It can also be called a "regression line" [sic]. To be meaningful, we have to view "time" as being a stochastic variable. This would be the case if, say, it were a measured duration of some event; but if it's actual clock time, one would question the meaningfulness of the entire procedure. – whuber Jul 23 '21 at 14:35
  • 2
    Hm. Since there is no natural ordering of your locations, such a line would be misleading. Unless you order the locations by each location's average. (Which in itself can elicit misunderstandings.) If that is what you are doing, then I would indeed just call it a "heatmap with location averages overlaid". Also, I don't quite see what average you are calculating, if for each location and each time bucket you have some value your heatmap is showing. – Stephan Kolassa Jul 23 '21 at 14:43
  • 1
    Locations are one next to other, i.e. they follow some line (e.g. an overhead power line). – meolic Jul 23 '21 at 14:48
  • 2
    OK, that makes sense. I'm still kind of wondering what you are calculating the average of. It would be an average time per location (since the line would be compared to the vertical axis), but what's the connection to the quantity you are heatmapping? – Stephan Kolassa Jul 23 '21 at 14:50
  • There are different possibiliteies for values, e.g. ambient temperature at the location. – meolic Jul 23 '21 at 14:51
  • @whuber Correct. Locations is not a continous variable and therefore using bars would be more appropriate type of visualization. Unfortunatelly not very nice as an overlay to heatmap. – meolic Jul 24 '21 at 07:24

1 Answers1

2

I would guess that this visualisation does not have a name. And there's a good reason for that - this is just not a good way to visualise data and should be avoided. It breaks some basic principles of visualisation.

  1. If you use time and location as your X- and Y- axes in a 2D plot, readers will reasonably conclude that any line you draw indicates the relationship between time and location. Using a line to indicate some property of an implicit third dimension (colour) is unintuitive.
  2. 'Locations' isn't continuous but is shown as though it is. Even if there is some natural ordering to the locations, this depiction (both the colour variation and the continuous line) paints a misleading picture unless these are evenly spaced sampling locations (in which case location = distance).

There are much better ways to visualise this data. It's possible to indicate both the time series and the overall means on one plot but I would recommend splitting these into two.

First, the time series. I would recommend a joy plot to depict them if there aren't too many locations. Small multiples line plots are another option.

Second, use a Cleveland dot plot to indicate the means.

mkt
  • 18,245
  • 11
  • 73
  • 172