Assume that I wish to find all complete human mitochondrial genome records on GenBank (or rather, NCBI nuccore) that also have an entry in NCBI's Biosample database.
MYQUERY="(mitochondrion[TITLE] OR mitochondrial[TITLE]) \
AND complete genome[TITLE] \
AND (human[TITLE] or homo sapiens[TITLE])"
I know that 61,719 such records exist on NCBI nuccore as of Sep-23-2023.
esearch -db nuccore -query "$MYQUERY" | \
xtract -pattern ENTREZ_DIRECT -element Count
I also know that 452 such records exist on NCBI Biosample as of Sep-23-2023.
esearch -db biosample -query "$MYQUERY" | \
xtract -pattern ENTREZ_DIRECT -element Count
However, these record sets either do not have an intersection or are not correctly linked (or I am using the edirect tools incorrectly), as 0 hits are identified.
esearch -db nuccore -query "$MYQUERY" | \
elink -target biosample | \
xtract -pattern ENTREZ_DIRECT -element Count
On the other hand, first esearching in the biosample database and then linking the hits to the nuccore database does seem to give 271 hits.
esearch -db biosample -query "$MYQUERY" | \
elink -target nuccore | \
xtract -pattern ENTREZ_DIRECT -element Count
What is the reason for this seemingly unpredictable behavior?
\after a|to break a line.|is a list terminator, so it's fine as a line break character. See – terdon Sep 24 '23 at 10:01