0

I want to read from an HDFS partition one record at a time, sequentially. I found a sample Java snippet that handles this logic. Is there a way to achieve this using PySpark/Python?

Sample Java snippet below (note the while loop):

FileSystem fileSystem = FileSystem.get(conf);
Path path = new Path("/path/file1.txt");
if (!fileSystem.exists(path)) {
System.out.println("File does not exists");
return;
}
FSDataInputStream in = fileSystem.open(path);
int numBytes = 0;
while ((numBytes = in.read(b))> 0) {
System.out.prinln((char)numBytes));// code to manipulate the data which is read
}
in.close();
out.close();
fileSystem.close();
Sudipto Dutta
  • 15
  • 1
  • 9

0 Answers0