0

If I want to list all rows of a column in a dataset in R, I am able to do it in these two ways:

> dataset[,'column'] 
> dataset$column

It appears that both give me the same result. What is the difference?

user3422637
  • 3,709
  • 16
  • 43
  • 69
  • Take a look [here](http://stackoverflow.com/questions/18222286/select-a-data-frame-column-using-and-the-name-of-the-column-in-a-variable) – David Arenburg Oct 12 '14 at 23:27

2 Answers2

4

In practice, not much, as long as dataset is a data frame. The main difference is that the dataset[, "column"] formulation accepts variable arguments, like j <- "column"; dataset[, j] while dataset$j would instead return the column named j, which is not what you want.

dataset$column is list syntax and dataset[ , "column"] is matrix syntax. Data frames are really lists, where each list element is a column and every element has the same length. This is why length(dataset) returns the number of columns. Because they are "rectangular," we are able to treat them like matrices, and R kindly allows us to use matrix syntax on data frames.

Note that, for lists, list$item and list[["item"]] are almost synonymous. Again, the biggest difference is that the latter form evaluates its argument, whereas the former does not. This is true even in the form `$`(list, item), which is exactly equivalent to list$item. In Hadley Wickham's terminology, $ uses "non-standard evaluation."

Also, as mentioned in the comments, $ always uses partial name matching, [[ does not by default (but has the option to use partial matching), and [ does not allow it at all.

I recently answered a similar question with some additional details that might interest you.

Community
  • 1
  • 1
shadowtalker
  • 10,289
  • 3
  • 39
  • 79
0

Use 'str' command to see the difference:

> mydf
  user_id Gender Age
1       1      F  13
2       2      M  17
3       3      F  13
4       4      F  12
5       5      F  14
6       6      M  16
> 
> str(mydf)
'data.frame':   6 obs. of  3 variables:
 $ user_id: int  1 2 3 4 5 6
 $ Gender : Factor w/ 2 levels "F","M": 1 2 1 1 1 2
 $ Age    : int  13 17 13 12 14 16
> 
> str(mydf[1])
'data.frame':   6 obs. of  1 variable:
 $ user_id: int  1 2 3 4 5 6
> 
> str(mydf[,1])
 int [1:6] 1 2 3 4 5 6
> 
> str(mydf[,'user_id'])
 int [1:6] 1 2 3 4 5 6

> str(mydf$user_id)
 int [1:6] 1 2 3 4 5 6
> 
> str(mydf[[1]])
 int [1:6] 1 2 3 4 5 6
> 
> str(mydf[['user_id']])
 int [1:6] 1 2 3 4 5 6

mydf[1] is a data frame while mydf[,1] , mydf[,'user_id'], mydf$user_id, mydf[[1]], mydf[['user_id']] are vectors.

rnso
  • 21,961
  • 22
  • 97
  • 206
  • You can use `drop=FALSE` with a couple of those if you want to get a `data.frame` back. – GSee Oct 13 '14 at 01:37