I was thinking of either setting up a correlation table or perform a regression, to be able to see the signs? But I am not really sure what the intuitive difference between the two approaches are; i.e. if the correlation sign is +, then by all means - will the coefficient in a regression always be positive as well?
Yes. provided that the regression model is is a simple linear regression involving only the same two variables that you used to compute the correlation coefficient, and no others. This is because the regression coefficient is simply the (Pearson) correlation coefficient multiplied by the ratio of the standard deviations of the two variables. Since this ratio is always positive, it follows that the correlation coefficient and regression coefficient will have the same sign.
The difference between the correlation coefficient and the regression coefficient is that correlation measures the strength of a linear relationship, while regression quantifies it further by telling you by how much one variable will change when the other one is changed. Thus, the regression model computes a line of best fit, giving a slope and an intercept. Please refer to this question and it's answers for much more detail on the similarities and differences between correlation and regression:
What's the difference between correlation and simple linear regression?
If, however you fit a multivariable regression model, then things can change dramatically, including the sign of the regression coefficient, due to confounding. A simple example can demonstrate this:
First we create a dataset consisting of 3 variables:
> X <- c(1, 2, 3, 10, 11, 12)
> Y <- c(10.1, 9.2, 7.8, 14.9, 14.1, 12.9)
> C <- c(1, 1, 1, 2, 2, 2)
> (df <- cbind(Y,X,C))
Y X C
[1,] 10.1 1 1
[2,] 9.2 2 1
[3,] 7.8 3 1
[4,] 14.9 10 2
[5,] 14.1 11 2
[6,] 12.9 12 2
> cor(df)
Y X C
Y 1.0000000 0.8661881 0.9410920
X 0.8661881 1.0000000 0.9839347
C 0.9410920 0.9839347 1.0000000
We see that all the correlations are positive.
First we regress Y on X:
> summary(lm(Y ~ X))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 8.2733 1.1381 7.270 0.0019 **
X 0.4964 0.1432 3.467 0.0257 *
All good (apparently) - the regression coefficient is positive and equal to the correlation between X and Y multiplied by the ratio of their standard deviations:
> cor(X,Y) * sd(Y)/sd(X)
[1] 0.4964143
But now we introduce the 3rd variable, C:
> summary(lm(Y ~ X + C))
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -3.4250 0.6491 -5.276 0.013274 *
X -1.0750 0.0870 -12.356 0.001142 **
C 14.6083 0.7958 18.357 0.000353 ***
Whoops, the coefficient for X is now negative. Why does this happen ? This is an example of Simpson's paradox, where the inclusion of a 3rd variable changes the regression completely. This example is based on a long answer I wrote concerning Simpson's Paradox some time ago:
Can you please explain Simpson's paradox with equations, instead of contingency tables?