SQLite

SQLite#

Since SQLite adopts a dynamic type system, we infer type as follow:

If there is a declared type of the column, we derive the type using column affinity rules, code can be found here.
Otherwise we directly adopt the value’s type in the first row of the result (in each partition), which results in INTEGER, REAL, TEXT and BLOB.
If the first row of the result is NULL in the partition, try next partition. Throw an error if first rows of all partitions are NULL for a column.

SQLite Connection#

import connectorx as cx
db_path = '/home/user/path/test.db'                         # path to your SQLite database
conn = 'sqlite://' + db_path                                # connection token
query = 'SELECT * FROM `database.dataset.table`'            # query string
cx.read_sql(conn, query)                                    # read data from SQLite

Example on windows:

import connectorx as cx
import urllib
db_path = urllib.parse.quote("C:\\user\\path\\test.db")     # url encode the path to your SQLite database         
conn = 'sqlite://' + db_path                                # connection token
query = 'SELECT * FROM `database.dataset.table`'            # query string
cx.read_sql(conn, query)                                    # read data from SQLite

SQLite Type Mapping#

SQLite Type	Pandas Type	Comment
INTEGER	int64, Int64(nullable)	declared type that contains substring “int”
BOOL	bool, boolean(nullable)	declared type is “boolean” or “bool”
REAL	float64	declared type that contains substring “real”, “floa”, “doub”
TEXT	object	declared type that contains substring “char”, “clob”, “text”
BLOB	object	declared type that contains substring “blob”
DATE	datetime64[ns]	declared type is “date”
TIME	object	declared type is “time”
TIMESTAMP	datetime64[ns]	declared type is “datetime” or “timestamp”, the format must follow `YYYY-MM-DD HH:MM:SS"/"YYYY-MM-DD HH:MM:SS.SSS`

Performance (r5.4xlarge EC2 same instance)#

Turbodbc does not support read_sql on SQLite

Time chart, lower is better.

time chart

Memory consumption chart, lower is better.

memory chart

In conclusion, ConnectorX uses 2x less memory and 5x less time compared with Pandas.

SQLite

Contents

SQLite#

SQLite Connection#

SQLite Type Mapping#

Performance (r5.4xlarge EC2 same instance)#