0

I typically use the below code to write a PySpark data frame into a Hive table. I have a column pxn_dt which will be used to partition the table.

How can I modify the code below so that it will create partitions into the table (with the new month) the next time I run the script?

from pyspark.sql import SparkSession
from pyspark.sql import SQLContext
from pyspark.sql.functions import *

spark = SparkSession.builder.enableHiveSupport().getOrCreate()
sqlContext = SQLContext(spark)

df.createOrReplaceTempView("mytempTable")

sqlContext.sql("create table my_db.table from mytempTable")

I'm trying to use the below line instead but it doesn't seem to work.

sqlContext.sql("create table my_db.table from mytempTable partitioned by(pxn_dt)")
JM DR
  • 23
  • 5
  • Check this thread https://stackoverflow.com/questions/31341498/save-spark-dataframe-as-dynamic-partitioned-table-in-hive – Mahesh Gupta Feb 23 '22 at 11:46

0 Answers0