I typically use the below code to write a PySpark data frame into a Hive table. I have a column pxn_dt which will be used to partition the table.
How can I modify the code below so that it will create partitions into the table (with the new month) the next time I run the script?
from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql.functions import *
spark = SparkSession.builder.enableHiveSupport().getOrCreate()
sqlContext = SQLContext(spark)
df.createOrReplaceTempView("mytempTable")
sqlContext.sql("create table my_db.table from mytempTable")
I'm trying to use the below line instead but it doesn't seem to work.
sqlContext.sql("create table my_db.table from mytempTable partitioned by(pxn_dt)")