2

For example assume I have three observations as given in matrix A

$A = \left[ \begin{array}{ccc} 1 & 0 & 1 \\ 1 & 1 & 0 \\ 0 & 1 & 1 \\ \end{array} \right]$

each row vector of matrix A has a weight value associated with it as $w = (0.5, 0.3, 0.2) $

Using the equation in Wikipedia which is:

$$ \bar{x} = \sum_{i = 1}^{N} w_ix_i $$

I'm confused on the direction of $i$ in the above equation.

I managed to interpret this in two ways below

  1. $i$ represents a row in matrix A $$ \bar{x} = 0.5*\left[ \begin{array}{ccc} 1 & 0 & 1 \\ \end{array} \right] + 0.3*\left[ \begin{array}{ccc} 1 & 1 & 0\\ \end{array} \right] + 0.2*\left[ \begin{array}{ccc} 0 & 1 & 1 \\ \end{array} \right] $$ $$ \bar{x} = \left[ \begin{array}{ccc} 0.5 & 0 & 0.5 \\ \end{array} \right] + \left[ \begin{array}{ccc} 0.3 & 0.3 & 0\\ \end{array} \right] + \left[ \begin{array}{ccc} 0 & 0.2 & 0.2 \\ \end{array} \right] = \left[ \begin{array}{ccc} 0.8 & 0.5 & 0.7 \\ \end{array} \right] $$

  2. $i$ represents a column in matrix A (column is represented as a row) $$ \bar{x} = \left[ \begin{array}{ccc} 0.5 & 0.3 & 0 \\ \end{array} \right] + \left[ \begin{array}{ccc} 0 & 0.3 & 0.2\\ \end{array} \right] + \left[ \begin{array}{ccc} 0.5 & 0 & 0.2 \\ \end{array} \right] $$ $$ \bar{x} = \left[ \begin{array}{ccc} 1.0 & 0.6 & 0.4 \\ \end{array} \right] $$

Which calculation is correct? For me the approach 1 seems correct but I'm not sure.

Please explain the two scenarios

  1. If the columns of matrix A represents attributes and the rows represent observations which method is correct?
  2. If the columns of matrix A represents observations and the rows represent attributes which method is correct?
Zen
  • 24,121
Synex
  • 125

3 Answers3

1

Looking at the beginning of the same Wikipedia article, you will find that in the data matrix $(x_{ij})$ each line represents one of the $n$ sample values of the $k$ random variables. Hence, $x_i$ means the row vector $(x_{i1},\dots,x_{ik})$. More explicitly, $$ \left[ \begin{array}{c} x_1 \\ x_2 \\ \vdots \\ x_n \\ \end{array} \right] := \left[ \begin{array}{c} x_{11} & x_{12} & \dots & x_{1k} \\ x_{21} & x_{22} & \dots & x_{2k} \\ \vdots & \vdots & \vdots & \vdots \\ x_{n1} & x_{n2} & \dots & x_{nk} \\ \end{array} \right] \, . $$ Therefore, your first interpretation of $\bar{x}=\sum_{i=1}^n w_i x_i$ is the correct one.

Zen
  • 24,121
0

Well it depends. What is N, the number of individuals ? What is A ? Lines are individuals or columns are ? Is the other dimension representing the features ?

Let's say A is N x K where N is number of individuals and K number of feature. I guess what you want to get is the weighted mean of each feature so you should calculate W.A (matrix product) where W is a horizontal vector (w1,...,wN). Here this corresponds to your first calculation.

Scratch
  • 812
  • If the columns of matrix A represent attributes and the rows mean a observation vector the first calculation is correct and if columns represent an observation and the row represents attributes the second method is correct? – Synex Feb 13 '14 at 16:14
  • yes, if you want to calculate weighted means of features and weights apply to observations. Try resetting your problem with nb of observations != nb variables. You will see it will become clear. – Scratch Feb 13 '14 at 18:07
0

Deciding which calculation is correct depends on what your matrix means. If your observations corresponds to the columns of the matrix, you should use method 1. Otherwise, if it corresponds to the rows of the matrix, you should use method 2.

Baumann
  • 946