0

Suppose I have an expensive task, such as running an ML job or an Athena query, that is schedule to run daily. I know that the results from the previous day can be re-used if nothing has changed. Additionally, I can detect if anything has changed using an SQL query.

Can Airflow tasks be composed to implement logic like this?

  • For today's task run...
  • Get the result of SELECT MAX(last_updated) FROM items
    • If the result matches yesterday's execution, then copy the previous day's results
    • Otherwise, run the expensive task

Note I am using Airflow 2.2.2 (MWAA)

sdgfsdh
  • 28,918
  • 18
  • 108
  • 208

1 Answers1

0

Can Airflow tasks be composed to implement logic like this?

Yes. This can be achieved with the BranchPythonOperator. See this StackOverflow post for an example.

Reference: BranchPythonOperator (Airflow)

Andrew Nguonly
  • 1,714
  • 12
  • 19