Postgres#
Protocols#
binary
: Postgres Binary COPY protocol, recommend to use in general since fast data parsing speed.csv
: Postgres CSV COPY protocol, recommend to use when network is slow (csv
usually results in smaller size thanbinary
).cursor
: Conventional wire protocol (slowest one), recommend to use only whenbinary
andcsv
is not supported by the source (e.g. Redshift).
Postgres Connection#
Hint
Adding sslmode=require
to connection uri parameter force SSL connection. Example: postgresql://username:password@host:port/db?sslmode=require
. sslmode=disable
to disable SSL connection.
To connect to redshift, replace postgresql://
with redshift://
.
import connectorx as cx
conn = 'postgres://username:password@server:port/database' # connection token
query = "SELECT * FROM table" # query string
cx.read_sql(conn, query) # read data from Postgres
Postgres-Pandas Type Mapping#
Postgres Type |
Pandas Type |
Comment |
---|---|---|
BOOL |
bool, boolean(nullable) |
|
INT2 |
int64, Int64(nullable) |
|
INT4 |
int64, Int64(nullable) |
|
INT8 |
int64, Int64(nullable) |
|
FLOAT4 |
float64 |
|
FLOAT8 |
float64 |
|
NUMERIC |
float64 |
cannot support precision larger than 28 |
TEXT |
object |
|
BPCHAR |
object |
|
VARCHAR |
object |
|
CHAR |
object |
|
BYTEA |
object |
|
DATE |
datetime64[ns] |
|
TIME |
object |
|
TIMESTAMP |
datetime64[ns] |
|
TIMESTAMPZ |
datetime64[ns] |
|
UUID |
object |
|
JSON |
object |
|
JSONB |
object |
|
ENUM |
object |
need to convert enum column to text manually ( |
ltree |
object |
binary protocol supported only after Postgres version 13 |
lquery |
object |
binary protocol supported only after Postgres version 13 |
ltxtquery |
object |
binary protocol supported only after Postgres version 13 |
INT2[] |
object |
list of i64 |
INT4[] |
object |
list of i64 |
INT8[] |
object |
list of i64 |
FLOAT4[] |
object |
list of f64 |
FLOAT8[] |
object |
list of f64 |
NUMERIC[] |
object |
list of f64 |
Performance (db.m6g.4xlarge RDS)#
Time chart, lower is better.
Memory consumption chart, lower is better.
In conclusion, ConnectorX uses 3x less memory and 13x less time compared with Pandas.