Hi I would like to split a large bigquery table (10 Billion Event Records) into multiple tables based on the event_type in the large table.
Note the events table is time/day/event_time partitioned. Further assume that it as a year of data(365 days)
let's assume the event_type=['sign-up', 'page-view']
My approach:
- Create a new table, each for the event type
- Run and insert job, for each event type for each day[also i will be using dml inside a python script]
My questions:
- what load job type should i use: copy or load job?
- can i queue the load jobs to google big query?[would it work asynchronously?]
- would google big query process this load job in parallel?
- Is there anything I need to do it interms of using multiprocessing inorder to speed up the process? [the load job is handled by bigquery, if i can queue in the jobs than i don't need to do any multiprocessing on the client side]
Any pointers to an efficient solution is highly appreciated.