6

I would like to visualise the following scatterplot in a different way that would make it more intuitive. The X axis is trading frequency and describes how many trades were conducted whereas the Yaxis describes the return on investment achieved with the transactions.

The sample is quite large and dominated by extreme values. Therefore I considered creating violin plots for each of the trading frequencies but that seemed a bit extreme.

Are there any other alternatives to this scatterplot that would allow me to visualise this more appealingly?

enter image description here

Richard Hardy
  • 67,272
  • 7
    Since this is a "how to visualize my data" question, it would really benefit from including (a sample of) the data. – dipetkov Sep 26 '22 at 17:52
  • An alternative to transforming the statistic on the y-axis, could be to re-formulate the statistic that you plot on the y-axis. I suspect that ROI * Freq might work well on the y-axis. – Sextus Empiricus Sep 27 '22 at 10:00

3 Answers3

9

I would begin by considering a transformation - very like to be useful for the Y-axis, and possibly for the X too. Log if there are no zeroes or negative values, square root if there are no negative values (zeroes are fine), cube root if there are negative values.

After transformation, you could consider:

  1. A scatterplot with some jitter and possibly transparency (probably not great).
  2. A violin or semi-violin plot.
  3. A hexbin plot (quite likely the best option).
mkt
  • 18,245
  • 11
  • 73
  • 172
6

You could indicate the density of points with a heatmap, e.g., using a black-body radiation palette. Alternatively, use a grayscale. Or use a hexbinplot, which pretty much does the grayscaling for you.

Stephan Kolassa
  • 123,354
3

Binscatter plots are designed for this exact problem: visualizing two-way relationships in huge datasets. See here for R and python packages and some references.

The name says it all. Binscatter procedure partitions the data domain into bins and plots only sample averages in the bins rather than all data points.

Main advantage of binscatter is the ease with which it handles additional covariates. So, roughly speaking, visualizing n-way relationships. See here for an overview and some examples.

Banach
  • 176