14

I currently find Harvard's RESTful API for ExAC extremely useful and I was hoping that a similar resource is available for Gnomad?

Does anyone know of a public access API for Gnomad or possibly any plans to integrate Gnomad into the Harvard API?

Daniel Standage
  • 5,080
  • 15
  • 50
Pasted
  • 243
  • 2
  • 5
  • 3
    Just to add a comment that Cellbase annotation now includes Gnomad data (Exomes and Genomes) and can be accessed via RESTful API - http://bioinfo.hpc.cam.ac.uk/cellbase/webservices – Pasted Nov 02 '17 at 10:33

7 Answers7

15

As far as I know, no but the vcf.gz files are behind a http server that supports Byte-Range, so you can use tabix or any related API:

$ tabix "https://storage.googleapis.com/gnomad-public/release-170228/vcf/exomes/gnomad.exomes.r2.0.1.sites.vcf.gz" "22:17265182-17265182"
22  17265182    .   A   T   762.04  PASS    AC=1;AF=4.78057e-06;AN=209180;BaseQRankSum=-4.59400e+00;ClippingRankSum=2.18000e+00;DP=4906893;FS=1.00270e+01;InbreedingCoeff=4.40000e-03;MQ=3.15200e+01;MQRankSum=1.40000e+00;QD=1.31400e+01;ReadPosRankSum=2.23000e-01;SOR=9.90000e-02;VQSLOD=-5.12800e+00;VQSR_culprit=MQ;GQ_HIST_ALT=0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|0|1;DP_HIST_ALT=0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0|0;AB_HIST_ALT=0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0;GQ_HIST_ALL=1591|589|120|301|650|589|1854|2745|1815|4297|5061|2921|10164|1008|6489|1560|7017|457|6143|52950;DP_HIST_ALL=2249|1418|6081|11707|16538|9514|28624|23829|7391|853|95|19|1|0|0|1|0|1|0|0;AB_HIST_ALL=0|0|0|0|0|0|0|0|0|0|0|0|1|0|0|0|0|0|0|0;AC_AFR=0;AC_AMR=0;AC_ASJ=0;AC_EAS=0;AC_FIN=1;AC_NFE=0;AC_OTH=0;AC_SAS=0;AC_Male=1;AC_Female=0;AN_AFR=11994;AN_AMR=31324;AN_ASJ=7806;AN_EAS=13112;AN_FIN=20076;AN_NFE=94516;AN_OTH=4656;AN_SAS=25696;AN_Male=114366;AN_Female=94814;AF_AFR=0.00000e+00;AF_AMR=0.00000e+00;AF_ASJ=0.00000e+00;AF_EAS=0.00000e+00;AF_FIN=4.98107e-05;AF_NFE=0.00000e+00;AF_OTH=0.00000e+00;AF_SAS=0.00000e+00;AF_Male=8.74386e-06;AF_Female=0.00000e+00;GC_AFR=5997,0,0;GC_AMR=15662,0,0;GC_ASJ=3903,0,0;GC_EAS=6556,0,0;GC_FIN=10037,1,0;GC_NFE=47258,0,0;GC_OTH=2328,0,0;GC_SAS=12848,0,0;GC_Male=57182,1,0;GC_Female=47407,0,0;AC_raw=1;AN_raw=216642;AF_raw=4.61591e-06;GC_raw=108320,1,0;GC=104589,1,0;Hom_AFR=0;Hom_AMR=0;Hom_ASJ=0;Hom_EAS=0;Hom_FIN=0;Hom_NFE=0;Hom_OTH=0;Hom_SAS=0;Hom_Male=0;Hom_Female=0;Hom_raw=0;Hom=0;POPMAX=FIN;AC_POPMAX=1;AN_POPMAX=20076;AF_POPMAX=4.98107e-05;DP_MEDIAN=58;DREF_MEDIAN=5.01187e-84;GQ_MEDIAN=99;AB_MEDIAN=6.03448e-01;AS_RF=9.18451e-01;AS_FilterStatus=PASS;CSQ=T|missense_variant|MODERATE|XKR3|ENSG00000172967|Transcript|ENST00000331428|protein_coding|4/4||ENST00000331428.5:c.707T>A|ENSP00000331704.5:p.Phe236Tyr|810|707|236|F/Y|tTc/tAc||1||-1||SNV|1|HGNC|28778|YES|||CCDS42975.1|ENSP00000331704|Q5GH77||UPI000013EFAE||deleterious(0)|benign(0.055)|hmmpanther:PTHR14297&hmmpanther:PTHR14297:SF7&Pfam_domain:PF09815||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00000672806|TF_binding_site|||||||||||1||||SNV|1||||||||||||||||||||||||||||||||||||||||||||,T|regulatory_region_variant|MODIFIER|||RegulatoryFeature|ENSR00001729562|CTCF_binding_site|||||||||||1||||SNV|1||||||||||||||||||||||||||||||||||||||||||||

UPDATE: 2019: the current server for gnomad doesn't support Byte-Range requests.

Pierre
  • 1,536
  • 7
  • 11
  • As of 15 Jan 2020, cellbase (bioinfo.hpc.cam.ac.uk/cellbase/webservices) does. See Pasted's comment on the question. Byte-range requests are supported, and results come from GnomAD, ExAC, and other data sets. A pretty awesome resource actually, even if byte-range requests were supported at GnomAD or UCSC. – mRotten Jan 15 '20 at 18:57
6

The new gnomAD site (as of August 2019) says no, no API yet:

How do I query a batch of variants? Do you have an API?

We currently do not have a way to submit batch queries on the browser, but we are actively working on developing an API for ExAC/gnomAD. If you would like to learn about GraphQL, which we will use to work with the API, an overview can be found at https://graphql.org. You can also obtain information on all variants from the VCFs and Hail Tables available on our downloads page.

But, the web interface itself already makes POST requests to https://gnomad.broadinstitute.org/api to send and receive JSON/GraphQL. So, you can make those same queries programmatically right now, even if it's not officially a public API.

Here's an example in Python to get some basic info on variants for a particular gene. This way you get simple nested Python objects to work with:

  { 'consequence': 'intron_variant',
    'pos': 6928442,
    'rsid': 'rs782435448',
    'variant_id': '12-6928442-C-A'},
  { 'consequence': 'splice_region_variant',
    'pos': 6928462,
    'rsid': None,
    'variant_id': '12-6928462-C-A'},
  { 'consequence': 'splice_acceptor_variant',
    'pos': 6928464,
    'rsid': 'rs782577109',
    'variant_id': '12-6928464-G-A'},
  { 'consequence': 'missense_variant',
    'pos': 6928466,
    'rsid': 'rs782208003',
    'variant_id': '12-6928466-C-T'},

(I found it useful to go this route because then the full metadata visible in the gnomAD web interface is then available, including the per-variant details like allele counts by population. I couldn't find this information in the other APIs described here.)

Jesse
  • 947
  • 6
  • 10
  • can you expand on your example a bit. the graphAPI is unfamiliar to me. How would you turn the json above into an actual query? – user3030872 Mar 31 '24 at 22:41
4

You can browse gnomAD variants with ClinGen Allele Registry (there is API spec available).

llrs
  • 4,693
  • 1
  • 18
  • 42
user1690
  • 41
  • 1
  • 1
2

I faced same issue recently, I found those link and python script:

gnomAD GraphQL api https://gnomad.broadinstitute.org/api It works great but it is a kind of different query language. Please check here for the docs: https://graphql.org/learn/queries/

gnomAD Python Api https://github.com/furkanmtorun/gnomad_python_api

2

I found Jesse's code quite usefull ! For those who try to reproduce it, you should now add the reference genome ID, such as :

#!/usr/bin/env python

import requests

import pprint

prettyprint = pprint.PrettyPrinter(indent=2).pprint

def fetch(jsondata, url="https://gnomad.broadinstitute.org/api"): # The server gives a generic error message if the content type isn't # explicitly set headers = {"Content-Type": "application/json"} response = requests.post(url, json=jsondata, headers=headers) json = response.json() if "errors" in json: raise Exception(str(json["errors"])) return json

def get_variant_list(gene_id, dataset="gnomad_r2_1"): # Note that this is GraphQL, not JSON. fmt_graphql = """ { gene(gene_id: "%s", reference_genome: GRCh38) { variants(dataset: %s) { consequence pos rsid variant_id: variantId } } } """ # This part will be JSON encoded, but with the GraphQL part left as a # glob of text. req_variantlist = { "query": fmt_graphql % (gene_id, dataset), "variables": {"withFriends": False} } response = fetch(req_variantlist) return response["data"]["gene"]["variants"]

prettyprint(get_variant_list("ENSG00000010610"))

BretSnoop
  • 21
  • 1
  • Thanks, I updated my code to include the reference genome as well (you're right, it no longer works without that). – Jesse Apr 14 '22 at 15:38
1

I created a python package based on SQLite databases, where you can easily query all gnomAD variants for GRCh37/38. https://github.com/KalinNonchev/gnomAD_DB I have precomputed SQLite databases for gnomAD WGS for GRCh37/38 in the description of the package. Please take a look there."

Kalin
  • 11
  • 2
-1

https://github.com/KalinNonchev/gnomAD_MAF would solve your problem I think. It is in python and you can annotate fast variants with their allele frequencies.

LinkIt
  • 1
  • 1