0

I am currently working on a case study where I have to estimate how much a person makes by giving their property for rent. They provided me with a constraint which is as follows:

"avoid estimating prices that are more than 25 dollars off of the actual price"

At first, I tried modeling without considering the constraint but failed miserably since the score I was getting is around 0.25.

So, I guess that constraint should be implemented for sure. As I am somewhat of a novice, I did not come across such a case before, therefore having no idea how to approach it.

The dataset I am using is: https://www.kaggle.com/datasets/karthikbhandary2/property-rentals

For the sake of context I am sharing with you the full details of the case study:

You have been hired by Inn the Neighborhood, an online platform that allows people to rent out their properties for short stays. Currently, the webpage for renters has a conversion rate of 2%. This means that most people leave the platform without signing up.

The product manager would like to increase this conversion rate. They are interested in developing an application to help people estimate the money they could earn renting out their living space. They hope that this would make people more likely to sign up.

The company has provided you with a dataset that includes details about each property rented, as well as the price charged per night. They want to avoid estimating prices that are more than 25 dollars off of the actual price, as this may discourage people.

Sycorax
  • 90,934
  • 2
    A typical suggestion for a machine learning book is called Elements of Statistical Learning. – Dave Jul 06 '22 at 16:15

1 Answers1

3

There are a couple of possible approaches. An easy approach is to penalize the training for predicting a value that differs from truth by more than $25$ and giving no penalty otherwise. Let $L$ be the loss function for training.

$$ L(y, \hat y) = \sum_{i=1}^N \max\bigg\{ 0, \bigg\vert y_i - \hat y_i \bigg\vert - 25 \bigg\} $$

Using this loss function would penalize the misses that are more than $25$ off but consider the misses by less than $25$ to be "close enough".

However, the misses by $24$ are still misses, and it is reasonable to penalize them. In that case, you might elect to tack on the above penalty to your normal loss function, such as square or absolute loss.

$$ L_{\lambda}(y, \hat y) = \sqrt{\sum_{i=1}^N \bigg(y_i - \hat y_i\bigg)^2} + \lambda\bigg[\sum_{i=1}^N \max\bigg\{ 0, \bigg\vert y_i - \hat y_i \bigg\vert - 25 \bigg\} \bigg] \\ L_{\lambda}(y, \hat y) = \sum_{i=1}^N \bigg\vert y_i - \hat y_i\bigg\vert + \lambda\bigg[\sum_{i=1}^N \max\bigg\{ 0, \bigg\vert y_i - \hat y_i \bigg\vert - 25 \bigg\} \bigg] $$

The $\lambda$ hyperparameter controls the extent to which missing by $25$ is considered particularly serious. If $\lambda = 0$, then missing by $25$ is not at all special. As you increase $\lambda$, you increase the severity of errors in excess of $25$. By considering some optimization criterion, such as profit or conversion, you can tune $\lambda$ to fit your particular problem. If you find $\lambda = 1$ to be too severe of a penalty (perhaps it pushes most of your predictions to be within $24$ of truth), perhaps try $\lambda = 0.5$. If $\lambda = 0.5$ is too weak of a penalty, try $\lambda = 0.75$.

For that matter, you could tweak $25$ and see how that changes your profit or conversion rate.

$$ L_{\lambda, k}(y, \hat y) = \sqrt{\sum_{i=1}^N \bigg(y_i - \hat y_i\bigg)^2} + \lambda\bigg[\sum_{i=1}^N \max\bigg\{ 0, \bigg\vert y_i - \hat y_i \bigg\vert - k \bigg\} \bigg] \\ L_{\lambda, k}(y, \hat y) = \sum_{i=1}^N \bigg\vert y_i - \hat y_i\bigg\vert + \lambda\bigg[\sum_{i=1}^N \max\bigg\{ 0, \bigg\vert y_i - \hat y_i \bigg\vert - k \bigg\} \bigg] $$

Maybe you'll find that you maximize conversion rate with $\lambda = 0.7$ and $k = 29$.

Dave
  • 62,186
  • @KarthikBhandary Asking for code is not on-topic on this site. – Sycorax Jul 06 '22 at 16:42
  • Can I apply these loss functions to any model? or are there a particular set of models? – Karthik Bhandary Jul 06 '22 at 17:02
  • @KarthikBhandary I was thinking of something like a linear regression or a neural network, either of which could use such a loss function. What models did you have in mind? – Dave Jul 06 '22 at 17:08
  • I used RandomForestRegressor and failed miserably. I guess I should give linear regression a try. – Karthik Bhandary Jul 06 '22 at 17:13
  • I just used the linear regression and it return 0.07 – Karthik Bhandary Jul 06 '22 at 17:16
  • 1
    @KarthikBhandary What is $0.07?$ – Dave Jul 06 '22 at 17:16
  • The score. I created a LinearRegression() and I fit X_train and y_train. I used lr.score(X_test, y_test) it gave 0.07 as the score – Karthik Bhandary Jul 06 '22 at 17:18
  • 1
    @KarthikBhandary But what does that mean? // It will be more helpful if you discuss in terms of the math. I think I recognize your Python code, but I mostly use R when I do regressions, so I might be misinterpreting the code. Further, it will solidify your understanding of what’s happening. – Dave Jul 06 '22 at 19:35
  • I think what it means is that it is trying to predict the price a property is going to have. It is scoring itself on how well it is predicting. Correct me if I am wrong. – Karthik Bhandary Jul 07 '22 at 02:34
  • 1
    Yet another loss could be stated in terms of sums-of-squares. $$L_{\lambda}(y, \hat y) = \sum_{i=1}^N \bigg(y_i - \hat y_i\bigg)^2 + \lambda\bigg[\sum_{i=1}^N \max\bigg{ 0, \bigg( y_i - \hat y_i \bigg)^2 - 25^2 \bigg} \bigg] $$ – Sycorax Jul 07 '22 at 20:37
  • @Sycorax That was the original way I wrote it. I changed the equation so that $\lambda$ could be unitless, though having units in the hyperparameter shouldn’t cause any issues. Would SSE vs RMSE be expected to change the predictions? – Dave Jul 07 '22 at 20:44
  • It's a different model, because the loss that I stated is in terms of squares -- the penalty term takes on different values. If you sketch a plot of my penalty, it looks like a parabola bounded below at 0, while the absolute value penalty is an absolute value bounded below at 0. Also, sqrt of sum of squares is different than sum of sqrt of squares. – Sycorax Jul 07 '22 at 21:13