S3 Parquet Import
Prerequisites
To load a Parquet file from S3, the httpfs extension is required. This can be installed using the INSTALL SQL command. This only needs to be run once.
INSTALL httpfs;
To load the httpfs extension for usage, use the LOAD SQL command:
LOAD httpfs;
Credentials and Configuration
After loading the httpfs extension, set up the credentials and S3 region to read data:
CREATE SECRET (
TYPE S3,
KEY_ID 'AKIAIOSFODNN7EXAMPLE',
SECRET 'wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY',
REGION 'us-east-1'
); Tip If you get an IO Error (
Connection error for HTTP HEAD), configure the endpoint explicitly viaENDPOINT 's3.⟨your-region⟩.amazonaws.com'.
Alternatively, use the aws extension to retrieve the credentials automatically:
CREATE SECRET (
TYPE S3,
PROVIDER CREDENTIAL_CHAIN
); Querying
After the httpfs extension is set up and the S3 configuration is set correctly, Parquet files can be read from S3 using the following command:
SELECT * FROM read_parquet('s3://⟨bucket⟩/⟨file⟩'); Google Cloud Storage (GCS) and Cloudflare R2
DuckDB can also handle Google Cloud Storage (GCS) and Cloudflare R2 via the S3 API. See the relevant guides for details.
© Copyright 2018–2024 Stichting DuckDB Foundation
Licensed under the MIT License.
https://duckdb.org/docs/guides/network_cloud_storage/s3_import.html