Postgres#

Protocols#

  • binary: Postgres Binary COPY protocol, recommend to use in general since fast data parsing speed.

  • csv: Postgres CSV COPY protocol, recommend to use when network is slow (csv usually results in smaller size than binary).

  • cursor: Conventional wire protocol (slowest one), recommend to use only when binary and csv is not supported by the source (e.g. Redshift).

Postgres Connection#

Hint

Adding sslmode=require to connection uri parameter force SSL connection. Example: postgresql://username:password@host:port/db?sslmode=require. sslmode=disable to disable SSL connection.

To connect to redshift, replace postgresql:// with redshift://.

import connectorx as cx
conn = 'postgres://username:password@server:port/database'         # connection token
query = "SELECT * FROM table"                                   # query string
cx.read_sql(conn, query)                                        # read data from Postgres

Postgres-Pandas Type Mapping#

Postgres Type

Pandas Type

Comment

BOOL

bool, boolean(nullable)

INT2

int64, Int64(nullable)

INT4

int64, Int64(nullable)

INT8

int64, Int64(nullable)

FLOAT4

float64

FLOAT8

float64

NUMERIC

float64

cannot support precision larger than 28

TEXT

object

BPCHAR

object

VARCHAR

object

CHAR

object

BYTEA

object

DATE

datetime64[ns]

TIME

object

TIMESTAMP

datetime64[ns]

TIMESTAMPZ

datetime64[ns]

UUID

object

JSON

object

JSONB

object

ENUM

object

need to convert enum column to text manually (::text) when using csv and cursor protocol

ltree

object

binary protocol supported only after Postgres version 13

lquery

object

binary protocol supported only after Postgres version 13

ltxtquery

object

binary protocol supported only after Postgres version 13

INT2[]

object

list of i64

INT4[]

object

list of i64

INT8[]

object

list of i64

FLOAT4[]

object

list of f64

FLOAT8[]

object

list of f64

NUMERIC[]

object

list of f64

Performance (db.m6g.4xlarge RDS)#

  • Time chart, lower is better.

time chart

  • Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses 3x less memory and 13x less time compared with Pandas.