1

I am looking for a what I think is probably a statistical analysis model. It could actually be a machine learning or fuzzy logic algorithm but my problem is that I know that I don’t know if it even exists or not. I think it must be quite a common requirement but I only know how to explain it verbosely so it’s very hard to Google.

I have a number of different independent variables which are all categorical. The dependent variable is an interval. I want to use my data to predict dependent variables given a combination on the independent variables. Its quite possible that the combination exist in the dataset 1 or many times. It is more like that it does though and I still want to make a prediction. Below is a crude and simplistic example of the kind of thing that I am looking for.

\begin{array} {|r|r|} \hline &A &B &C &Dependent \\ \hline Record1 &1 &1 &1 &100\\ \hline Record2 &2 &2 &1 &200\\ \hline Record3 &2 &2 &3 &210\\ \hline \hline ? &1 &1 &3 &? \end{array}

There are no records which matches exactly the independent variables for the prediction.

But, record 1 matches all of the variables apart from C.

Record 2-variable C matches variable C of the Record 1 and everything else matches record 3

Record 3-variable C matches C of the prediction data.

So the difference between record 2 and 3 is 10 and they have exactly the same different in variables as record 1 and the prediction data so in the case it is plausible to assume that the dependent variable for the prediction is 100 (record1) + 10 (the difference between Record 2 and 3).

This is a very simple case of my data. There are actually many variables so it’s possible that there are cases where the best matching record still has many differences. I would still like to calculate a prediction in these cases though.

Eds
  • 11
  • Your perspective seems like you're trying to model this as an overdetermined system of equations. Search on that grounds. – Vikram Venkat Mar 19 '16 at 10:10

1 Answers1

-1

I am not 100% clear about what you need, but provided you avoid overfitting, linear regression should do it.