hdfs_fdw

foreign-data wrapper for remote hdfs servers

Overview

PackageVersionCategoryLicenseLanguage
hdfs_fdw2.3.3FDWBSD 3-ClauseC
IDExtensionBinLibLoadCreateTrustRelocSchema
8740hdfs_fdwNoYesNoYesNoNo-
Relatedpg_parquet mongo_fdw kafka_fdw wrappers multicorn jdbc_fdw aws_s3 duckdb_fdw

Version

TypeRepoVersionPG VerPackageDeps
EXTPGDG2.3.31817161514hdfs_fdw-
RPMPGDG2.3.31817161514hdfs_fdw_$v-
OS / PGPG18PG17PG16PG15PG14
el8.x86_64
el8.aarch64
el9.x86_64
el9.aarch64
el10.x86_64
el10.aarch64
d12.x86_64PGDG MISSPGDG MISSPGDG MISSPGDG MISSPGDG MISS
d12.aarch64PGDG MISSPGDG MISSPGDG MISSPGDG MISSPGDG MISS
d13.x86_64PGDG MISSPGDG MISSPGDG MISSPGDG MISSPGDG MISS
d13.aarch64PGDG MISSPGDG MISSPGDG MISSPGDG MISSPGDG MISS
u22.x86_64PGDG MISSPGDG MISSPGDG MISSPGDG MISSPGDG MISS
u22.aarch64PGDG MISSPGDG MISSPGDG MISSPGDG MISSPGDG MISS
u24.x86_64PGDG MISSPGDG MISSPGDG MISSPGDG MISSPGDG MISS
u24.aarch64PGDG MISSPGDG MISSPGDG MISSPGDG MISSPGDG MISS

Install

You can install hdfs_fdw directly. First, make sure the PGDG repository is added and enabled:

pig repo add pgdg -u          # Add PGDG repo and update cache

Install the extension using pig or apt/yum/dnf:

pig install hdfs_fdw;          # Install for current active PG version
pig ext install -y hdfs_fdw -v 18  # PG 18
pig ext install -y hdfs_fdw -v 17  # PG 17
pig ext install -y hdfs_fdw -v 16  # PG 16
pig ext install -y hdfs_fdw -v 15  # PG 15
pig ext install -y hdfs_fdw -v 14  # PG 14
dnf install -y hdfs_fdw_18       # PG 18
dnf install -y hdfs_fdw_17       # PG 17
dnf install -y hdfs_fdw_16       # PG 16
dnf install -y hdfs_fdw_15       # PG 15
dnf install -y hdfs_fdw_14       # PG 14

Create Extension:

CREATE EXTENSION hdfs_fdw;

Usage

hdfs_fdw: Foreign data wrapper for remote HDFS servers

Create Server

CREATE EXTENSION hdfs_fdw;

CREATE SERVER hdfs_server FOREIGN DATA WRAPPER hdfs_fdw
  OPTIONS (host '127.0.0.1', port '10000', client_type 'hiveserver2');

Server Options: host (default localhost), port (default 10000), client_type (hiveserver2 or spark, default hiveserver2), auth_type (NOSASL or LDAP), connect_timeout (default 300), fetch_size (default 10000), log_remote_sql (default false), use_remote_estimate (default false), enable_join_pushdown (default true), enable_aggregate_pushdown (default true), enable_order_by_pushdown (default true).

Create User Mapping

CREATE USER MAPPING FOR postgres SERVER hdfs_server
  OPTIONS (username 'hive_user', password 'hive_password');

For NOSASL authentication, omit the OPTIONS clause entirely.

Create Foreign Table

CREATE FOREIGN TABLE weblogs (
  client_ip text,
  http_status_code text,
  uri text,
  request_count bigint
)
SERVER hdfs_server
OPTIONS (dbname 'default', table_name 'weblogs');

Table Options: dbname (default default), table_name (defaults to foreign table name), enable_join_pushdown, enable_aggregate_pushdown, enable_order_by_pushdown.

Query

SELECT client_ip, count(*) FROM weblogs GROUP BY client_ip ORDER BY count(*) DESC LIMIT 10;

Spark Example

CREATE SERVER spark_server FOREIGN DATA WRAPPER hdfs_fdw
  OPTIONS (host '127.0.0.1', port '10000', client_type 'spark');

CREATE USER MAPPING FOR postgres SERVER spark_server
  OPTIONS (username 'spark_user', password 'spark_pass');

CREATE FOREIGN TABLE spark_table (
  id int,
  name text,
  value double precision
)
SERVER spark_server
OPTIONS (dbname 'default', table_name 'my_table');

Pushdown Features

hdfs_fdw pushes down WHERE clauses, JOINs, aggregate functions, ORDER BY, and LIMIT/OFFSET to the remote Hive/Spark server. Control pushdown at the session level:

SET hdfs_fdw.enable_join_pushdown = on;
SET hdfs_fdw.enable_aggregate_pushdown = on;
SET hdfs_fdw.enable_order_by_pushdown = on;
SET hdfs_fdw.enable_limit_pushdown = on;

Last Modified 2026-03-12: add pg extension catalog (95749bf)