From a large-scale patent analysis, I have some results which can be shown in a graphic but I would like to capture them in a model somehow, to be able to quantify a fit.
The scatterplot below (in my perception) shows that up until a certain optimum, increase region diversity enables high radicality. Due to a huge amount of data points close to the x axis, I am unable to capture this in regression (linear or polynomial) as r^2 values are below 0.01.
If take the average radicality over each region, a (linear) regression is possible yet I still feel it is not the most appropriate model to fit to the data:

Based on these images, would there be a statistic/model/measure/method to capture the enabling capacity of region diversity on radicality?
I am sorry to say I lack experience and knowledge with statistics so apologies for my probable abuse of terminology.
Any help is highly appreciated!
[Edit] My variables Patent Radicality and Region Diversity are derived measures:
- All patents have 1 or more technology class assigned, based on the co- occurrence of these classes, relative distances between the classes were calculated.
- The patent radicality is the average distance between the classes assigned to it.
- The region diversity is the (Rao-Stirling) diversity of technology classes occurring in it, accounting for variety, balance and disparity.
My hypothesis is that increased region diversity should foster the radical innovation potential of that region.
I have about 5 million patents and two region levels (TL2 and TL3 as defined by the OECD) with about 300 and 2000 regions with sufficient data respectively. Both region levels yield similar results.
