Build Parquet S3 FDW

Build Parquet S3 FDW extension alone with its deps: libarrow, libparquet, 以及 libaws-cpp

There are two major deps for parquet_s3_fdw: arrowawssdk


Build arrow

Clone arrow repo and build it with cmake:

cd ~ ; git clone [email protected]:apache/arrow.git;
mkdir -p ~/arrow/cpp/release; cd ~/arrow/cpp/release;
cmake .. -DARROW_PARQUET=ON -DARROW_S3=ON; make -j8
sudo make install

Build libaws

There are many drivers in libaws-cpp, but we only need two: core and s3:

# install building deps
sudo yum install libcurl-devel openssl-devel libuuid-devel pulseaudio-libs-devel
# sudo apt-get install libcurl4-openssl-dev libssl-dev uuid-dev libpulse-dev # debian/ubuntu

# clone libaws repo (very big!)
cd ~; git clone --recurse-submodules [email protected]:aws/aws-sdk-cpp.git

mkdir -p ~/aws-sdk-cpp/release; cd ~/aws-sdk-cpp/release;
cmake .. -DBUILD_ONLY="s3"; make -j20
sudo make install

build libarrow-s3

Collect the generated .so files, then package them into an RPM / DEB package:

mkdir -p ~/libarrow-s3
cp -d ~/arrow/cpp/release/release/libarrow.so*                                     ~/libarrow-s3/
cp -d ~/arrow/cpp/release/release/libparquet.so*                                   ~/libarrow-s3/
cp -f ~/aws-sdk-cpp/release/generated/src/aws-cpp-sdk-s3/libaws-cpp-sdk-s3.so      ~/libarrow-s3/
cp -f ~/aws-sdk-cpp/release/src/aws-cpp-sdk-core/libaws-cpp-sdk-core.so            ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/lib/libaws-c-event-stream.so*                          ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/lib/libs2n.so*                                         ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/libaws-crt-cpp.so                        ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-common/libaws-c-common.so*     ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-checksums/libaws-checksums.so*   ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-io/libaws-c-io.so*             ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-mqtt/libaws-c-mqtt.so*         ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-cal/libaws-c-cal.so*           ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-checksums/libaws-checksums.so*   ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-s3/libaws-c-s3.so*             ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-common/libaws-c-common.so*     ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-http/libaws-c-http.so*         ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-sdkutils/libaws-c-sdkutils.so* ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-auth/libaws-c-auth.so*         ~/libarrow-s3/
cp -d ~/aws-sdk-cpp/release/crt/aws-crt-cpp/crt/aws-c-compression/libaws-c-compression.so* ~/libarrow-s3/

Remove empty RPATH from generated so files (EL system), using patchelf binary:

cd ~/libarrow-s3/
patchelf --remove-rpath libarrow.so.1800.0.0
patchelf --remove-rpath libparquet.so.1800.0.0
patchelf --remove-rpath libaws-cpp-sdk-core.so
patchelf --remove-rpath libaws-cpp-sdk-s3.so

And finally package these so files into a libarrow-s3 package:

cd ~/rpmbuild/SPECS
rpmbuild -ba ~/rpmbuild/SPECS/libarrow-s3.spec
sudo rpm -ivh ~/rpmbuild/RPMS/x86_64/libarrow-s3-17.0.0-1PIGSTY.*

Last modified 2024-11-25: fix zh comma (8ec38eac)