1

I want to open the expression data GSE9838 dataset from GEO using GEOquery R package.

gset <- getGEO(filename="GSE9838_family.soft.gz")

The result of this code said it's S4 object of class GSE.

How can I get the expression data of GSE9838?

Kamil S Jaron
  • 5,542
  • 2
  • 25
  • 59
Qiran
  • 11
  • 1
  • Welcome to the site. Have you looked for the tutorial of GEOquery? You can see them with vignette(package="GEOquery"). Also have you looked around on this site and in support.bioconductor.org for how have other people done this? – llrs Mar 28 '19 at 12:42
  • 3
  • Not sure if that method works in this case – Oka Mar 28 '19 at 19:15
  • @Oka Why shouldn't it work? Have Qiran tested it? At least mentioning that it didn't work it would help the people trying to answer the question (and future readers) – llrs Mar 29 '19 at 09:38
  • @llrs I´ve checked if it would, and it didn´t (at least on the first trial). The method worked fine with GSE used in the example, but didn´t seem to work in this case for some reason.. – Oka Mar 29 '19 at 13:00
  • Ok, how it didn't work, could you post the code so I could see what is the error message and the options you used? – llrs Mar 29 '19 at 13:08
  • datExpr = exprs (data[[1]]) will just give an object with one header line ( GSM248431 GSM248432 GSM248433 GSM248434 GSM248435 GSM248436 GSM248437 GSM248438 GSM248439 GSM248440 GSM248441 GSM248442) – Oka Mar 29 '19 at 13:50

1 Answers1

1

Well, if you use a SOFT formatted files:

#You can read your file 
gds <- getGEO(filename="GSE9838_family.soft.gz")

# and check it
class(gds)
show(gds)

#then you usually would try to use  an exprs() accessor to retrieve the expression matrix:
exprs (gds[[1]])

# but in this case you will get an error revealing that it is a GSE datastructure
# eg. is composite structure having GSM and GPL object lists: 
names(GSMList(gds))
names(GPLList(gds))

# now GPLList is probably not interesting in this case
# but you can look at the GSM-objects from GSMList:
first <- GSMList(gds)[[1]]
first
class(first)
Meta(first)

# at this point if you read the results you´ll notice that 
#    title 
#    [1] "X_intercept"

# and 
#   An object of class "GEODataTable"
#   ****** Column Descriptions ******
#   [1] Column      Description
#   <0 rows> (or 0-length row.names)
#   ****** Data Table ******
#   data frame with 0 columns and 0 rows

So here you would probably wonder if the data you want is these objects. If you go back go GEO page for GSE9838, you can find that "Raw data provided as supplementary file. Processed data is available on Series record". So one may want to check series record instead:

# you can query the GSE9838 like this: 
data = getGEO("GSE9838")    

# then you can check what you´ve got
names(data)
#[1] "GSE9838_series_matrix.txt.gz"
class(data$GSE9838_series_matrix.txt.gz)

# and check what´s inside:
exprs(data$GSE9838_series_matrix.txt.gz)
# but apparently it just shows one header row. 

So it wasn´t there either. And if you check the size of the file, you´ll find that it is just 4.8 kb. My educated guess would be that the expression values are not in the dataset itself but probably in some of the supplementary files?

If you want to download them, you can:

filePaths = getGEOSuppFiles("GSE9838")
filePaths
untar("GSE9838/GSE9838_RAW.tar", exdir = "GSE9838/CEL")
list.files("GSE9838/CEL")

...what will allow you to see all the CEL files with the data.

Oka
  • 203
  • 1
  • 3