I need to download locally (macOS BigSur) a large dataset (millions of rows) from an AWS RDS database (MySQL 5.7). Thanks to this great post I am able to connect and download on my machine some data into a csv file:
mysql --host=$HOST --user $USER --password=$PASSWORD --database=$DATABASE --port=$PORT --batch \
--quick -e "$QUERY" \
| sed $'s/\\t/","/g;s/^/"/;s/$/"/;s/\\n//g' > $FILE_PATH
However, if I extend my query to thousands of records, after few seconds the process stops and the csv ends up truncated (literally the last written row is truncated half way), so I assume there is some kind of stream or timeout or buffer issue.
mysql> SHOW VARIABLES LIKE '%timeout';
+-----------------------------+----------+
| Variable_name | Value |
+-----------------------------+----------+
| connect_timeout | 10 |
| delayed_insert_timeout | 300 |
| have_statement_timeout | YES |
| innodb_flush_log_at_timeout | 1 |
| innodb_lock_wait_timeout | 50 |
| innodb_rollback_on_timeout | OFF |
| interactive_timeout | 28800 |
| lock_wait_timeout | 31536000 |
| net_read_timeout | 30 |
| net_write_timeout | 60 |
| rpl_stop_slave_timeout | 31536000 |
| slave_net_timeout | 60 |
| wait_timeout | 28800 |
+-----------------------------+----------+
Since the command does stops after ~10sec I assume it depends on the connect_timeout value. However I tried setting it with SET @@GLOBAL.connect_timeout=7200 but I get a permission error. I tried adding the --connect-timeout=7200 parameter on the command, but it does not work (which strikes me the most).
Running the query (limited to ~250k rows) on my client (SequelAce) it runs fine, so I can exclude issues with the data or the the SQL script itself.
Any ideas or suggestions?
Are there better tools for the job maybe?