My dependent variable is measured on a 4 point likert scale and independent variable is measured on a 7 point likert scale. Is it appropriate to run regression analysis on such data with varying lengths of likert scale, especially a 4 point likert scale against a 7 point likert scale.
-
3Just to clarify, are you talking about single items with 4 and 7 response options respective or scales that are the result of taking the sum or mean of a set of items where each item happens to have 4 or 7 response options? – Jeromy Anglim Oct 01 '13 at 03:22
-
Yes sir I am talking about the same scenario. – Salman Oct 01 '13 at 03:26
-
1which one: individual items or composite scales? – Jeromy Anglim Oct 01 '13 at 03:27
-
composite scales – Salman Oct 01 '13 at 11:08
1 Answers
It is generally fine to use predictor and outcome variables that use different metrics when performing multiple regression.
To demonstrate the point, you can rescale predictor or dependent variables using a linear transformation (.e.g., z-scores, centering, and so on) and this will not influence your $R^2$ or your standardised regression coefficients (note that I'm not saying you should do this, I'm just pointing out that this aspect of scaling is not the issue). Of course, using 4 or 7 point response scales is more than just rescaling, but from my experience, correlations and $R^2$ wont change a lot based on whether you use a 4 or 7 point scale.
That said, there several issues to consider when you have predictor or dependent variables that are single item variables with a small number of ordered response options:
- What is the best response scale for measuring the variable of interest? If you are designing a study, then you may want to think about the optimal number of response options. There are a range of debates about this. Some people argue that you should have more response options (e.g., like a 7 or 10 point scale). Others suggest that you should align the set of response options to the meaningful distinctions that respondents are able to make, and that too many response options can lead to more person-specific anchoring effects; such arguments are often used to justify 5 point scales.
- What is the best way to measure the variable of interest? If you truly have a single item measure on a four or seven point scale, you would often be better served by developing a scale with multiple items that you then sum to form an overall measure. This will tend to be more reliable and lead to more discrimination. Both of these factors may result in improved prediction.
- Can you include an item with four ordered response options as a dependent variable in a linear regression? There are different answers to this. Certainly, it is possible, and many people do this. Of course the residuals wont be normally distributed, and it assumes that you are happy treating the categories of the response option as equally-distant. There are alternative techniques that attempt to more explicitly model ordinal data (such as ordinal logistic regression). In practice, as the number of categories increases, people are generally more willing to perform linear regression. Thus, if your dependent variable was the sum of a few items all on a four point scale, it would seem more appropriate. Four options on a single item is on the low-side.
- Can you include an item with seven ordered response options as a predictor variable in linear regression? Yes, this is fine. There are a many options regarding how you numerically code the variable. The standard approach would be to treat the categories as equally distant. Of course, you could explore other codings (there's even optimal scaling which attempt to optimise the coding of the variable subject to any constraints such as ordinality). Or you could include both a linear and quadratic coding for the variable to incorporate non-linearity of effect.
Note most of the above was written on the initial assumption that your predictor and outcome variable were single items. If you have multi-item scales that just happen to use different response scales, then there's not too much to think about. Most people treat such scales as standard numeric variables in their multiple regressions.
- 44,984
-
Thank you so much sir. It was very helpful. So it will be fine if I sum both the variables and run regression analysis, instead of taking mean or rescaling the response categories by converting 4 scale item to 7 scale or vice vesa. – Salman Oct 01 '13 at 03:58
-
1My main point there was that there is nothing special about having different response scales for the predictor and outcome variable, but that there are a set of issues to consider around using variables with a small number of response options in linear regression. – Jeromy Anglim Oct 01 '13 at 04:06
-
Salman, I'd bet an upvote that Jeromy would rather you didn't call him "sir". – Glen_b Oct 01 '13 at 05:58
-
-
1+1. I'd also add that it is important how you present the rating scale visually to respondents. For example, 1-2-3-4-5 may provoke a russian to intrapsychically distort the scale into unequal-interval because the scale is traditionally used for school marks in that country. – ttnphns Oct 01 '13 at 11:10
-
-
I've only dealt with this over factorial design structures with relatively small sample sizes, but when I performed a simulation of ordinary multiple regression with the Likert scale spacings of the response randomized the parameter biases were at times rather substantial. We were doing response surface optimization and the proportion of times we'd be 90-degrees off compared to the underlying pre-discretized data was disturbing. The problem may have been exacerbated by the factorial structure and small sample size, but I think it does merit concern. – neverKnowsBest Oct 01 '13 at 16:57