0

I am new to pyspark and facing few issues while executing jobs.

I am sending a job to a standalone spark instance with 2 executors configured correctly, Sometime both the executors start working in parallel and utilizes the allocated resources correctly and Job completes successfully. But sometimes only a single executor starts working and the other one remains idles on submitting the SAME job that was initially executing perfectly.

what could be the issue that every time both the executors are not performing their functionality.

below is my code.

from flask import Blueprint
import time
from pyspark import SparkContext
from pyspark import SQLContext
import pyspark

jl = Blueprint('HelloWorld', __name__, url_prefix='/')

@jl.route('/join')
def join_logic():
    conf = pyspark.SparkConf().setAll([('spark.executor.memory', '24g'), ('spark.executor.cores', '3'), ('spark.worker.memory', '56g'), ('spark.driver.memory','24g'), ('spark.worker.cores', '6'), ('spark.network.timeout', '10000001'), ('spark.executor.heartbeatInterval', '10000000')])

    sc = SparkContext("spark://X.X.X.X:7077","JOB_1", conf=conf)
    sqlContext = SQLContext(sc)

    df = sqlContext.read.format('jdbc').options(
    url='jdbc:mysql://x.x.x.x/schemaName?autoReconnect=true&useSSL=false',
    driver='com.mysql.jdbc.Driver',
    dbtable='table_name',
    user='root',
    password='xxxx').load()

    df1 = sqlContext.read.format('jdbc').options(
    url='jdbc:mysql://X.X.X.X/schema_Name?autoReconnect=true&useSSL=false',
    driver='com.mysql.jdbc.Driver',
    dbtable='Table_Name',
    user='root',
    password='xxxx').load()

    result = df.join(df1, df.column == df1.column, 'left')
    res = result.count()

    sc.stop()
    return str(res);
James Z
  • 12,104
  • 10
  • 27
  • 43
  • Are you getting the same results anyway? Maybe it'd caching it? – Andronicus Jan 30 '19 at 08:13
  • I have stopped and started the same job multiple times, sometime both executors starts executing job in parallel but sometime only one starts working and other remains idle. – Tabish Tehseen Jan 30 '19 at 08:22
  • So there is no result? – Andronicus Jan 30 '19 at 08:28
  • When both the executors work in parallel I do get output. But when only one executors works it get crashed after reaching its max resource allocation and Job gets failed. – Tabish Tehseen Jan 30 '19 at 08:31
  • It doesn't sound there is any malfunction here. You're assigning 3 cores to each executor and scheduler is free (it should avoid that with default settings, but this is not a guarantee) to assign both reading tasks (you literally read data using only 2 tasks) to the same executor. Your real problem is [how you use JDBC source](https://stackoverflow.com/a/43150938/10465355) and this is what [should be addressed here](https://stackoverflow.com/a/52678042/10465355) – 10465355 Jan 30 '19 at 11:13

0 Answers0