4

My group has a complex datset and I would like to validate the results and check the methods I use in a similar dataset (better if it has been already studied).

Characteristics of my dataset:

  • It is related to the inflammatory bowel disease (IBD)

    More specifically to Crohn's disease.

  • It has RNA-seq

    I have explored the GEO and there are few projects (10) with RNA-seq on the tissue of my disease, and I don't know how to find the 16S-seq of those projects if it exists (see point below).

  • It has 16S-seq (and not only from the stools)

    The ENA seems to have 16S sequencing but most of them are from stools not from biopsies and still, I don't know how to find the RNA-seq of those projects if it exists (see point above)

  • It has both RNA-seq and 16S-seq from the same individuals at the same time point.

    The NIH Human Microbiome Project for my disease has only one individual with that both 16S-seq and RNa-seq of the same patient at the same time.

How can I find a dataset similar to mine?

llrs
  • 4,693
  • 1
  • 18
  • 42

1 Answers1

2

The snarky answer is that it's better to find a matching dataset at the experimental design phase (i.e. before these studies are carried out), so that the bioinformatics component on the other side of things ends up being a lot easier and more robust. Although it can be quite time consuming to do a database search at the start, it's even harder to find matching projects if the legwork hasn't been done during the design phase. Researchers should be thinking about how hypotheses will be validated before the experiments are carried out.

Every additional characteristic that is chosen for comparison adds complications with regards to matching data, and reduces the applicability of results when there is a mismatch. You would have to be somewhat lucky to be able to find a IBD-Crohn's dataset with paired RNA-seq data and 16s-seq (presumably short-read; is 454 good enough?) from a specific non-stool tissue.

Apart from the places you've already looked, there's dbGaP and SRA. Given the number of characteristics that have been mentioned, I expect you'll have to make some compromises when comparing other datasets.

gringer
  • 14,012
  • 5
  • 23
  • 79
  • Thanks for the tip, but is hard to find projects at the experimental design phase. We have an easier dataset in our lab, but I would prefer to confirm it with other datasets. I had forgotten about these databases, thanks. – llrs Dec 08 '17 at 08:59