2

I am trying to figure out how the computer in this game works.

In short: it's like normal monopoly, except instead of owning a property comletely everybody can buy shares for the property (max 9 available). If a player lands on this property and you own some shares you also profit from the paid rent. Or, for example, if you own some shares and you land on this property you pay less rent to the owner. A small computer holds information about each players amount of shares, displays the value of each share (slightly goes up and down now and again), calculates rent etc.

I already figured out the formula to calculate the price when you want to buy a share. The price goes up if there are less shares available, when the owner owns the whole street or when houses are being build on the property for example. Now I want to know the formula for calculating the rent. There are a few parameters that "could" influence the result (I do not know for sure if every variable is in the formula of course):

  • the starting value of the property
  • the amount of shares in play (or free shares? since the max is 9)
  • the amount of shares the current player holds when he/she lands on the property
  • the amount of houses that were build on the property
  • the fact that the owner owns the complete "street" (has a monopoly for this "color")

With a fixed set of parameters, the computer gives a fixed result. So there MUST be a formula somehow. For example: a few values:

Property value Shares in play (1-9) Player owned shares (0-4) Houses(0-5) Monopoly(yes/no) Rent to pay
60 1 0 0 no 4
60 2 0 0 no 4
60 2 1 0 no 1
60 5 0 0 no 5
160 7 0 0 no 21
Property value Shares in play (1-9) Player owned shares (0-4) Houses(0-5) Monopoly(yes/no) Rent to pay
320 1 0 0 no 56
320 2 0 0 no 56
320 2 1 0 no 24
320 3 0 0 no 54
320 3 1 0 no 32
320 4 0 0 no 56
320 4 1 0 no 36
320 4 2 0 no 20
320 5 0 0 no 55
320 5 1 0 no 36
320 5 2 0 no 24
320 6 0 0 no 54
320 6 1 0 no 40
320 6 2 0 no 28
320 6 3 0 no 15
320 7 0 0 no 56
320 7 1 0 no 42
320 7 2 0 no 30
320 7 3 0 no 20
320 8 0 0 no 56
320 8 1 0 no 42
320 8 2 0 no 30
320 8 3 0 no 20
320 8 4 0 no 12
320 9 0 0 no 54
320 9 1 0 no 40
320 9 2 0 no 28
320 9 3 0 no 18
320 9 4 0 no 15
320 1 0 0 yes 112
320 2 0 0 yes 112
320 2 1 0 yes 49
320 3 0 0 yes 111
320 3 1 0 yes 64
320 4 0 0 yes 112
320 4 1 0 yes 72
320 4 2 0 yes 42
320 5 0 0 yes 110
320 5 1 0 yes 76
320 5 2 0 yes 48
320 6 0 0 yes 108
320 6 1 0 yes 80
320 6 2 0 yes 56
320 6 3 0 yes 33
320 7 0 0 yes 112
320 7 1 0 yes 84
320 7 2 0 yes 60
320 7 3 0 yes 40
320 8 0 0 yes 112
320 8 1 0 yes 84
320 8 2 0 yes 60
320 8 3 0 yes 40
320 8 4 0 yes 28
320 9 0 0 yes 108
320 9 1 0 yes 80
320 9 2 0 yes 63
320 9 3 0 yes 42
320 9 4 0 yes 30

I have a lot more data than this of course (I can even get ALL possible data, but that is beside the point). How can I find the formula for such a set? I tried entering all these values in Excel, plot graphics to find a "line" etc. but to no avail. What would be the best way?

Wietse
  • 21
  • 1
    The first, second, fourth, sixth and ninth lines lines are not encouraging for a simple formula. – Henry Nov 18 '23 at 00:30
  • Hi @Henry, thanks for your reply! I think the formula applies rounding somewhere, which results in these "strange" numbers. I could expand the table with more values, but then what? I think my question is a more broad one: how do I approach this problem (I don't have a degree in math)? I have a feeling it would be easiers to start with the "bigger" numbers (to avoid the big roundings problems with smaller numbers). –  Nov 18 '23 at 10:32
  • My issue is that it is not clear from those lines whether increases in shares in play tend to increase or reduce rent (in each case with no player owned shares, no houses and no monopoly) – Henry Nov 18 '23 at 10:39
  • I expanded the table with some more data for a single property in an attempt to answer your question. A rule of the game is that I only need to pay rent if I am NOT the owner of the property. Ownership only changes if you buy extra shares and afterwards own more shares than the current owner (so it is possible to have the same amount of shares, but you still have to pay rent since you are not yet the owner). –  Nov 18 '23 at 10:54
  • 1
    Maybe try symbolic regression: https://www.r-bloggers.com/2019/04/symbolic-regression-genetic-programming-or-if-kepler-had-r/ – jblood94 Nov 20 '23 at 13:41
  • Thanks for the suggestion! Symbolic regression looks to be the term I am looking for but I have a hard time finding some software to "predict the formula based on this dataset". I looked at (and tried) PySR and feyn but I failed because of a lack of knownledge. – Wietse Nov 24 '23 at 21:40
  • This question was posed many years ago at https://stats.stackexchange.com/questions/10363/. – whuber Nov 29 '23 at 22:32

2 Answers2

1

A statistical process to follow would be to use linear regression. You can do this in Excel, R, or python.

Typical steps:

  • Try a simple linear model
    • Examine the residuals and determine if there are necessary variable transforms
  • Try to engineer variables
    • difference in shares between own and outstanding
    • percent of outstanding shares owned
    • percent of total possible shares outstanding
  • Try a variable transform on the dependent variable (log)
  • Try a non-linear model

A good, but not perfect, model is:

$$\mathbf{Rent} = e^{0.96 + 0.01\mathbf{Value}-2.42\frac{\mathbf{owned}}{\mathbf{shares}}-0.244\frac{shares}{9}+0.71\mathbf{Monopoly}}$$

To get a better model this way, you need more examples of different values and some examples with houses.

Update

I also tried the symbolic regression procedure that was mentioned in the comments and in other answers. The results you get are very dependent on the allowed functions and operators. I could not easily beat the linear regression approach with the data presented.

Here is R code to illustrate:

dat <- structure(list(value = c(60, 60, 60, 60, 160, 320, 320, 320, 
  320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 
  320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 
  320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 
  320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 320, 
  320, 320, 320), shares = c(1, 2, 2, 5, 7, 1, 2, 2, 3, 3, 4, 4, 
  4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 9, 
  9, 1, 2, 2, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 6, 7, 7, 7, 7, 8, 
  8, 8, 8, 8, 9, 9, 9, 9, 9), owned = c(0, 0, 1, 0, 0, 0, 0, 1, 
  0, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 
  0, 1, 2, 3, 4, 0, 0, 1, 0, 1, 0, 1, 2, 0, 1, 2, 0, 1, 2, 3, 0, 
  1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4), houses = c(0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), monopoly = c(0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
  0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 
  1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1), 
  rent = c(4, 4, 1, 5, 21, 56, 56, 24, 54, 32, 56, 36, 20, 
  55, 36, 24, 54, 40, 28, 15, 56, 42, 30, 20, 56, 42, 30, 20, 
  12, 54, 40, 28, 18, 15, 112, 112, 49, 111, 64, 112, 72, 42, 
  110, 76, 48, 108, 80, 56, 33, 112, 84, 60, 40, 112, 84, 60, 
  40, 28, 108, 80, 63, 42, 30)), class = "data.frame", row.names = c(NA, 
  -63L))

no examples provided with houses

lm1 <- lm(rent ~ value + shares + owned + monopoly, data = dat) summary(lm1) #> #> Call: #> lm(formula = rent ~ value + shares + owned + monopoly, data = dat) #> #> Residuals: #> Min 1Q Median 3Q Max #> -25.488 -5.911 -2.093 4.843 21.256 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) -8.52642 6.10570 -1.396 0.1679
#> value 0.18937 0.02116 8.950 1.6e-12 *** #> shares 1.39035 0.60704 2.290 0.0257 *
#> owned -17.71002 1.16736 -15.171 < 2e-16 *** #> monopoly 37.34621 2.63746 14.160 < 2e-16 *** #> --- #> Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 10.05 on 58 degrees of freedom #> Multiple R-squared: 0.9056, Adjusted R-squared: 0.8991 #> F-statistic: 139.1 on 4 and 58 DF, p-value: < 2.2e-16 plot(lm1, which = 1)


# try to engineer some variables

dat$sharediff <- dat$shares - dat$owned dat$sharepct <- dat$shares / 9 dat$sharediffpct <- dat$sharediff / 9 dat$pctowned <- dat$owned / dat$shares

try a multiplicative model

lm4 <- lm(log(rent) ~ value + pctowned + sharepct + monopoly, data = dat) summary(lm4) #> #> Call: #> lm(formula = log(rent) ~ value + pctowned + sharepct + monopoly, #> data = dat) #> #> Residuals: #> Min 1Q Median 3Q Max #> -0.30260 -0.10133 0.01289 0.07176 0.65647 #> #> Coefficients: #> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 0.9601256 0.0904284 10.618 3.21e-15 *** #> value 0.0101118 0.0003152 32.078 < 2e-16 *** #> pctowned -2.4199149 0.1020827 -23.705 < 2e-16 *** #> sharepct -0.2442338 0.0736702 -3.315 0.00158 ** #> monopoly 0.7076224 0.0392421 18.032 < 2e-16 *** #> --- #> Signif. codes: 0 '*' 0.001 '' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> Residual standard error: 0.1496 on 58 degrees of freedom #> Multiple R-squared: 0.9739, Adjusted R-squared: 0.9721 #> F-statistic: 540.3 on 4 and 58 DF, p-value: < 2.2e-16 plot(lm4, which = 1, col = dat$shares, pch = 19)


################################################################################

require(gramEvol) #> Loading required package: gramEvol #> Warning: package 'gramEvol' was built under R version 4.3.2

ruleDef <- list(expr = gramEvol::grule(op(expr, expr), func(expr), var), func = gramEvol::grule(exp, sqrt), op = gramEvol::grule('+', '-', '*', '/'), var = gramEvol::grule(dat$value, dat$shares, dat$owned, dat$monopoly))

grammarDef <- gramEvol::CreateGrammar(ruleDef) grammarDef #> <expr> ::= <op>(<expr>, <expr>) | <func>(<expr>) | <var> #> <func> ::= exp | sqrt #> <op> ::= "+" | "-" | "*" | "/" #> <var> ::= dat$value | dat$shares | dat$owned | dat$monopoly

set.seed(123) gramEvol::GrammarRandomExpression(grammarDef, 6) #> [[1]] #> expression(sqrt(exp(dat$value + dat$owned))) #> #> [[2]] #> expression(exp(exp(dat$value/dat$monopoly)) + dat$owned) #> #> [[3]] #> expression((dat$value - sqrt(dat$owned))/dat$monopoly) #> #> [[4]] #> expression(dat$monopoly) #> #> [[5]] #> expression(exp(dat$shares)) #> #> [[6]] #> expression(dat$shares + dat$value)

SymRegFitFunc <- function(expr) { suppressWarnings(result <- eval(expr)) if (any(is.nan(result))) return(Inf) return(mean((dat$rent - result)^2)) }

SymRegFitFunc(expression(exp(dat$shares))) #> [1] 11838780

ge <- gramEvol::GrammaticalEvolution(grammarDef, SymRegFitFunc, terminationCost = 0.1, iterations = 2500, max.depth = 5) ge #> Grammatical Evolution Search Results: #> No. Generations: 2500 #> Best Expression: sqrt(dat$value) * (exp(dat$monopoly) + dat$monopoly) + dat$owned #> Best Cost: 693.648325144468

yhat <- eval(ge$best$expressions) resid <- dat$rent - yhat

plot(yhat, resid, xlab = "Predicted Rent", ylab = "Residuals")


# error of linear model
var(lm4$residuals)
#> [1] 0.02093519
mean(lm4$residuals^2)
#> [1] 0.02060289

error of symbolic regression

var(resid) #> [1] 600.0785 mean(resid^2) #> [1] 693.6483

Created on 2023-12-01 with reprex v2.0.2

R Carnell
  • 5,323
  • Thank you for your input, I will try this somewhere in the upcoming days and get back to you! – Wietse Nov 28 '23 at 13:39
1

Why don't you go and make use of symbolic regression, e.g. with pySR a symbolic regression package. If you can generate enough observations and denote all impact factors (not contain noise), then such an approach to find formulas would be my go-to:

Here's a simple example of how you can use PySR to fit a model:

import numpy as np
from pysr import pysr

Generate some example data

np.random.seed(42) X = np.random.rand(100, 1) y = 3 * X.squeeze() + np.random.normal(0, 0.1, 100)

Define the function signature to search for

equation = pysr(X, y, niterations=100)

Print the discovered equation

print("Discovered Equation:", equation)

This should recover the Y=3X formula which we started out with.

Ggjj11
  • 1,237