Postgres

Postgres#

Protocols#

binary: Postgres Binary COPY protocol, recommend to use in general since fast data parsing speed.
csv: Postgres CSV COPY protocol, recommend to use when network is slow (csv usually results in smaller size than binary).
cursor: Conventional wire protocol (slowest one), recommend to use only when binary and csv is not supported by the source (e.g. Redshift).

Postgres Connection#

Hint

Adding sslmode=require to connection uri parameter force SSL connection. Example: postgresql://username:password@host:port/db?sslmode=require. sslmode=disable to disable SSL connection.

To connect to redshift, replace postgresql:// with redshift://.

import connectorx as cx
conn = 'postgres://username:password@server:port/database'         # connection token
query = "SELECT * FROM table"                                   # query string
cx.read_sql(conn, query)                                        # read data from Postgres

Postgres-Pandas Type Mapping#

Postgres Type	Pandas Type	Comment
OID	u32	a data type for identifying internal objects
BOOL	bool, boolean(nullable)
INT2	int64, Int64(nullable)
INT4	int64, Int64(nullable)
INT8	int64, Int64(nullable)
FLOAT4	float64
FLOAT8	float64
NUMERIC	float64	cannot support precision larger than 28
TEXT	object
BPCHAR	object
VARCHAR	object
CHAR	object
BYTEA	object
DATE	datetime64[ns]
TIME	object
TIMESTAMP	datetime64[ns]
TIMESTAMPZ	datetime64[ns]
UUID	object
JSON	object
JSONB	object
ENUM	object	need to convert enum column to text manually (`::text`) when using `csv` and `cursor` protocol
ltree	object	binary protocol supported only after Postgres version 13
lquery	object	binary protocol supported only after Postgres version 13
ltxtquery	object	binary protocol supported only after Postgres version 13
Inet	object
Vector	object	list of f32
HalfVec	object	list of f32
Bit	object
SparseVec	object	list of f32
BOOL[]	object	list of bool
INT2[]	object	list of i64
INT4[]	object	list of i64
INT8[]	object	list of i64
FLOAT4[]	object	list of f64
FLOAT8[]	object	list of f64
NUMERIC[]	object	list of f64

Performance (db.m6g.4xlarge RDS)#

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses 3x less memory and 13x less time compared with Pandas.