5

I am looking for a modular way to query data from ENCODE.

For example, I would like to get CHiP-seq or similar tracks for a specific cell line. What's the proper way to do it?

Finally, is there an API to do it?

0x90
  • 1,437
  • 9
  • 18

4 Answers4

7

I don't know whether there is an API, but ENCODE's website does provide an interactive data matrix where you can filter data based on assay and sample type, place data sets in a "shopping cart", and then proceed to "checkout" to download the files of interest.

Splash page

Data matrix

Checkout

Daniel Standage
  • 5,080
  • 15
  • 50
  • I'll add the fact that I've never used this website before today, so the fact that I was able to navigate it so easily is a credit to the group(s) that created and maintain the website! – Daniel Standage Jan 18 '19 at 20:03
5

You can just add /?format=json to any page to get the JSON output.

ENCODE REST API documentation: https://www.encodeproject.org/help/rest-api/

Example scripts: https://github.com/ENCODE-DCC/submission_sample_scripts

burger
  • 2,179
  • 10
  • 21
  • So let's say I want to get the bed file of this experiment using curl https://www.encodeproject.org/experiments/ENCSR000CKC/ – 0x90 Feb 04 '19 at 07:31
  • If you look at the JSON for that experiment, there are multiple BED files listed there: https://www.encodeproject.org/experiments/ENCSR000CKC/?format=json – burger Feb 04 '19 at 16:40
2

So let's say I want to get the bed file of this experiment using curl encodeproject.org/experiments/ENCSR000CKC

You can also search for files that belong to a given experiment, e.g. https://www.encodeproject.org/search/?type=File&dataset=/experiments/ENCSR000CKC/

enter image description here Then further select file properties (status, format, etc.) using the facets on the left.

This basically builds up a search URL for you, which is also possible to do manually. For example all BED files in ENCSR000CKC with GRCh38 assembly, returned as JSON: https://www.encodeproject.org/search/?type=File&dataset=/experiments/ENCSR000CKC/&file_format=bed&assembly=GRCh38&format=json (See properties tab here https://www.encodeproject.org/profiles/file.json for all of the file fields you can use.)

You can also specify a field parameter to return only fields you are interested in. For example https://www.encodeproject.org/search/?type=File&dataset=/experiments/ENCSR000CKC/&file_format=bed&assembly=GRCh38&field=s3_uri&field=cloud_metadata.url&field=dataset&field=file_format&format=json only returns the s3_uri, cloud_metadata.url, dataset, and file_format fields. These objects are returned in the @graph list:

enter image description here

Putting this all together to parse in Python:

>>> import requests
>>> import pandas as pd
>>> url = (
...     'https://www.encodeproject.org'
...     '/search/?type=File'
...     '&dataset=/experiments/ENCSR000CKC/'
...     '&file_format=bed'
...     '&assembly=GRCh38'
...     '&field=file_format'
...     '&field=assembly'
...     '&field=s3_uri'
...     '&field=cloud_metadata.url'
...     '&format=json'
... )
>>> r = requests.get(url)
>>> files = r.json()['@graph']
>>> df = pd.DataFrame(files)
>>> df['cloud_metadata'] = df['cloud_metadata'].apply(lambda x: x.get('url'))
>>> df[['@id', 'assembly', 'file_format', 's3_uri', 'cloud_metadata']]
                   @id assembly file_format                                                                                 s3_uri                                                                                             cloud_metadata
0  /files/ENCFF368QQU/   GRCh38         bed  s3://encode-public/2016/08/04/10493464-917f-4290-bb13-c5fec8319887/ENCFF368QQU.bed.gz  https://encode-public.s3.amazonaws.com/2016/08/04/10493464-917f-4290-bb13-c5fec8319887/ENCFF368QQU.bed.gz
1  /files/ENCFF554NVI/   GRCh38         bed  s3://encode-public/2016/08/04/865f220e-7a47-427a-a7c1-9bdc049d4658/ENCFF554NVI.bed.gz  https://encode-public.s3.amazonaws.com/2016/08/04/865f220e-7a47-427a-a7c1-9bdc049d4658/ENCFF554NVI.bed.gz
2  /files/ENCFF217CRP/   GRCh38         bed  s3://encode-public/2016/08/04/43aa693e-1581-4886-8ace-7e1609f45790/ENCFF217CRP.bed.gz  https://encode-public.s3.amazonaws.com/2016/08/04/43aa693e-1581-4886-8ace-7e1609f45790/ENCFF217CRP.bed.gz

The s3_uri and cloud_metadata.url both point to the same thing. You can use the s3_uri to download using awscli or boto3. You can pass the https link to cURL.

ksgraham
  • 21
  • 2
0

There are two places:

  1. encodeproject.org
  2. screen.encodeproject.org

Details can be found in the ENCODE3 papers published in July 2020

Code42
  • 282
  • 1
  • 9