0

I have a large set of data that models the average number of days it takes for people to re-lease a house of type $i$. I have created an example via Matlab below using random data to help illustrate this. In this example, there are five different house types, such that $i\in [1,5]$. enter image description here

I am trying to determine some kind of relationship between the re-lease time and house types. For instance, does a house of type $A$ have a longer average re-lease time than a house of type $B$. How would it be best to determine if a relationship exists between the two variables? As an idea, I have taken the mean and the mode of the average release times for each house type, but I am wondering if this will produce a skewed result given a single data point for a house of type $E$.

  • I'm a little confused by the graph. You have "average re-lease" measured in days on the left hand side. But you have quite a few points for this. What is this the "average" of? I would (perhaps naively) expect that the average re-lease time would be just a single point for each house. –  Jul 05 '20 at 02:57
  • @VividKraig Consider the first house type $A$. A single data point represents the average re-lease time of all the people that have leased that single house of type $A$. That is, each data point represents a house of type $i$. Does that help? –  Jul 05 '20 at 03:07
  • (1) Because some houses will have been re-leased more often than others, each average for (say) type A houses will have a different variance. (2) Your objective is no clearly stated: Do you want to know if house types have different re-lease times? Or are you trying to predict re-lease times from house types? Or something else? – BruceET Jul 05 '20 at 07:20
  • @BruceET I am assuming that re-lease times infers demand. Therefore, can we infer which house type is more in demand? Does that help? –  Jul 05 '20 at 07:33
  • If you had data for 5 house types from five definite distributions, you might to an ANOVA on types A through D to see if there are statistically significant differences among their population means, and if so try to determine if one of the types has a significantly smaller population mean. (E has to be omitted because you have only one house.) // However, as in my previous comment, you don't have 4 specific distributions. It might be best to describe the data (by finding group means or medians) and see if you feel comfortable saying one has meaningfully lower re-lease times. – BruceET Jul 05 '20 at 07:48
  • @BruceET Would it be more appropriate to discuss group means or medians? What gives a better representation of the data and reduce any skewness? –  Jul 05 '20 at 07:50

0 Answers0