4

I want to retrieve abstracts/summaries for NCBI bookshelf entries, eg: "NBK1440"

Docs say the dbname is "books" and efetch guide says a rettype of "docsum" works for all databases. However when I make the query:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=books&id=NBK1440&rettype=docsum

returns:

<ERROR>UID=1440: cannot get document summary</ERROR>

The default query is useless (just echos back the ID):

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=books&id=NBK1440&retmode=text

returns

1440

Picking another 3:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=books&id=NBK1391,NBK410087,NBK1391&rettype=docsum

Fails the same way:

<ERROR>UID=1391: cannot get document summary</ERROR>
<ERROR>UID=410087: cannot get document summary</ERROR>
<ERROR>UID=1391: cannot get document summary</ERROR>

The same goes for esummary:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=books&id=1440

Dies on NBK1440 and removing NBK gives:

<ERROR>UID=1440: cannot get document summary</ERROR>

The values come from the (9Mb) ClinVar citations file: https://ftp.ncbi.nlm.nih.gov/pub/clinvar/tab_delimited/var_citations.txt - the first is from tihs line:

15048   9   1800562     NCBIBookShelf   NBK1440

ANSWER

Thanks to arupgsh - yes the ClinVar citation_id was a RID for bookshelf not an ID like it is for Pubmed and PMC. FYI my solution was:

def get_summary_for_bookshelf_rid(bookshelf_rid):
    handle = Entrez.esearch(db="books", term=bookshelf_rid)
    search_results = Entrez.read(handle)
    handle = Entrez.esummary(db="books", id=','.join(search_results['IdList']))
    results = Entrez.read(handle)

    for r in results:
        if r["RID"] == bookshelf_rid:
            return r

    return None

get_summary_for_bookshelf_rid("NBK1440")

returns:

{'PubDate': '2000/04/03 00:00', 'Title': 'HFE-Associated Hereditary Hemochromatosis', 'Text': '', 'RType': 'chapter', u'Id': '1475938', u'Item': [], 'Book': 'gene', 'Parents': '', 'RID': 'NBK1440', 'ID': 'gene/chapter/hemochromatosis/PMC'}

2 Answers2

5

Entrez requires the unique identifiers(UID) for fetching related info. The id you are using in the query is RID. I guess that why you are getting <ERROR>UID=1440: cannot get document summary</ERROR>.

List of example uids using the term "ncbi+blast":

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=books&term=ncbi+blast

Summary of the first item:

https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=books&id=4459608

Using the Entrez python package:

from Bio import Entrez
Entrez.email = "youremail@domain.com"
handle = Entrez.esummary(db="books", id="4459608,4456634", retmode="xml")
records = Entrez.parse(handle)
for record in records:
     print(record['Title'])
handle.close()

I'm not sure about the availability of book abstracts.

arup
  • 604
  • 5
  • 15
3

As arupgsh says, you need to use esearch to get a list of unique identifiers before using efetch to retrieve info about each result. I think the easiest way to do this is to use Entrez Direct, which allows you to simply pipe esearch output to efetch:

esearch -db books -query NBK1440 | efetch -format docsum

or

esearch -db books -query NBK1440 | esummary
heathobrien
  • 1,816
  • 7
  • 16