0

I want to read multi big files that exist on centos server with python.I wrote a simple code for that and it's worked but entire file came to a paramiko object (paramiko.sftp_file.SFTPFile) after that I can process line. it has not good performance and I want process file and write to csv piece by piece because process entire file can affect performance. Is there a way to solve the problem?

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.connect(host, port, username, password)

sftp_client = ssh.open_sftp()
remote_file = sftp_client.open(r'/root/bigfile.csv')

try:
    for line in remote_file:
        #Proccess
finally:
    remote_file.close()
Nestor
  • 11
  • 2

2 Answers2

0

Here could solve your problem.

 def lazy_loading_ftp_file(sftp_host_conn, filename):
    """
        Lazy loading ftp file when exception simple sftp.get call
        :param sftp_host_conn: sftp host
        :param filename: filename to be downloaded
        :return: None, file will be downloaded current directory
    """
    import shutil
    try:
        with sftp_host_conn() as host:
            sftp_file_instance = host.open(filename, 'r')
            with open(filename, 'wb') as out_file:
                shutil.copyfileobj(sftp_file_instance.raw, out_file)
            return {"status": "sucess", "msg": "sucessfully downloaded file: {}".format(filename)}
    except Exception as ex:
        return {"status": "failed", "msg": "Exception in Lazy reading too: {}".format(ex)}

This will avoid reading the whole thing into memory at once.

Ramsey
  • 103
  • 6
-1

Reading in chunks will help you here:

import pandas as pd
chunksize = 1000000
for chunk in pd.read_csv(filename, chunksize=chunksize):
    process(chunk)

Update:

Yeah, I'm aware that my answer written based on a local file. Just giving example for reading file in chunks.

To answer the question, check out this one:

  1. paramiko.sftp_client.SFTPClient.putfo
  2. Functions for working with remote files using pandas and paramiko (SFTP/SSH). - pass the chunk size as I mentioned above.
surya
  • 647
  • 4
  • 13
  • files are not on local server and they are on sftp server entire file comes to sftp object – Nestor May 31 '21 at 07:46
  • 1
    Did you realize that the file is not present on a local file system, and that `sftp` is not a valid URL scheme (protocol) for `read_csv`? Said differently this does not answer the current question... – Serge Ballesta May 31 '21 at 07:46