9

I am using Standard SQL.Even though its a basic query it is still throwing errors. Any suggestions pls

SELECT 
  fullVisitorId,
  CONCAT(CAST(fullVisitorId AS string),CAST(visitId AS string)) AS session,
  date,
  visitStartTime,
  hits.time,
  hits.page.pagepath
FROM
  `XXXXXXXXXX.ga_sessions_*`,
  UNNEST(hits) AS hits
WHERE
  _TABLE_SUFFIX BETWEEN "20160801"
  AND "20170331"
ORDER BY
  fullVisitorId,
  date,
  visitStartTime
Community
  • 1
  • 1
HKE
  • 403
  • 1
  • 4
  • 13

2 Answers2

11

The only way for this query to work is by removing the ordering applied in the end:

SELECT 
  fullVisitorId,
  CONCAT(CAST(fullVisitorId AS string),CAST(visitId AS string)) AS session,
  date,
  visitStartTime,
  hits.time,
  hits.page.pagepath
FROM
  `XXXXXXXXXX.ga_sessions_*`,
  UNNEST(hits) AS hits
WHERE
  _TABLE_SUFFIX BETWEEN "20160801"
  AND "20170331"

ORDER BY operation is quite expensive and cannot be processed in parallel so try to avoid it (or try applying it in a limited result set)

Willian Fuks
  • 10,409
  • 10
  • 42
  • 67
  • Thanks Willian. It's working, but can you tell me the reason why it was not working when I use order by. – HKE Sep 01 '17 at 21:30
  • 4
    There were too many rows to hold in memory on a single node. If you look at the "Explanation" tab for the query, it will show where it ran out of memory. – Elliott Brossard Sep 02 '17 at 09:08
  • Thanks @ElliottBrossard – HKE Sep 05 '17 at 12:30
  • 1
    I encountered the same issue. Even weirder, the query succeeded in the web UI but not in the python API. Removed ORDER BY clauses solved the issue but it's a bit weird to experience the discrepancy. – Rutger Hofste Jul 31 '18 at 13:58
  • I know this isa bit old, but does `OVER (PARTITION BY ...` has the same effect? – Islam Azab Jan 28 '21 at 18:38
3

Besides the accepted answer, you might want to partition your table by date to lessen the amount of memory used with an expensive query.

Embedded_Mugs
  • 1,669
  • 3
  • 16
  • 24
  • 1
    The above query is to pull the GA data and by default it is partitioned by date. _TABLE_SUFFIX BETWEEN "20160801" AND "20170331" This is how I am pulling data from different date ranges – HKE Feb 12 '18 at 11:53