Datatypes and Fixed Encoding
This topic describes standard datatypes and space-saving variations for
values stored in OmniSci.
Datatypes
Each OmniSci datatype uses space in memory and on disk. Datatypes and sizes are listed in the following table.
Datatype |
Size (bytes) |
Notes |
TEXT ENCODING DICT |
4 |
Max cardinality 2 billion distinct string values. Maximum string length is
32,767. |
TEXT ENCODING NONE |
Variable |
Size of the string + 6 bytes. Maximum string length is 32,767. |
TIME |
8 |
Minimum value: 00:00:00 ; maximum value: 23:59:59 . |
TIMESTAMP(0) | 8 | Linux timestamp from -30610224000 (1/1/1000 00:00:00 ) through 29379542399 (12/31/2900 23:59:59 ). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS OR YYYY-MM-DDTHH:MM:SS (the T is dropped when the field is populated). |
TIMESTAMP(3) | 8 | Linux timestamp from -30610224000000 (1/1/1000 00:00:00.000 ) through 29379542399999 (12/31/2900 23:59:59.999 ). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.fff OR YYYY-MM-DDTHH:MM:SS.fff (the T is dropped when the field is populated). |
TIMESTAMP(6) | 8 | Linux timestamp from -30610224000000000 (1/1/1000 00:00:00.000000 ) through 29379542399999999 (12/31/2900 23:59:59.999999 ). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.ffffff OR YYYY-MM-DDTHH:MM:SS.ffffff (the T is dropped when the field is populated). |
TIMESTAMP(9) | 8 | Linux timestamp from -9223372036854775807 (09/21/1677 00:12:43.145224193 ) through 9223372036854775807 (11/04/2262 23:47:16.854775807 ). Can also be inserted and stored in human-readable format: YYYY-MM-DD HH:MM:SS.fffffffff OR YYYY-MM-DDTHH:MM:SS.fffffffff (the T is dropped when the field is populated). |
DATE * |
4 |
Minimum value: -2,147,483,648 ; maximum value: 2,147,483,647
Range in years: +/-5,883,517 around epoch. Maximum date January 1,
5885487 (approximately). Supported formats when using COPY FROM : mm/dd/yyyy , dd-mmm-yy , yyyy-mm-dd , dd/mmm/yyyy .
|
EPOCH
| 8 |
Seconds ranging from -30610224000
(1/1/1000 00:00:00 ) through 185542587100800
(1/1/5885487 23:59:59 ). |
FLOAT |
4 |
Variable precision. Minimum value: -3.4 x e^38 ; maximum value: 3.4 x e^38 . |
DOUBLE |
8 |
Variable precision. Minimum value: -1.79 x e^308 ; maximum value: 1.79 x e^308 |
INTEGER |
4 |
Minimum value: -2,147,483,647 ; maximum value: 2,147,483,647 . |
TINYINT |
1 |
Minimum value: -127 ; maximum value: 127 . |
SMALLINT |
2 |
Minimum value: -32,767 ; maximum value: 32,767 . |
BIGINT |
8 |
Minimum value: -9,223,372,036,854,775,807 ; maximum value: 9,223,372,036,854,775,807 . |
BOOLEAN |
1 |
TRUE: 'true' ,
'1' , 't' . FALSE: 'false' , '0' , 'f' . Text values are not case-sensitive. |
DECIMAL/NUMERIC |
2, 4, or 8 |
Size depends on precision:
- Up to
4 : 2 bytes
5 to 9 : 4 bytes
10 to 18 (maximum): 8 bytes
Scale must be less than precision. |
* In OmniSci release 4.4.0 and higher, you can use existing 8-byte
DATE
columns, but you can create only 4-byte DATE
columns (default) and 2-byte DATE
columns
(see DATE ENCODING FIXED(16)
).
Note |
- OmniSci does not support geometry arrays.
- Timestamp values are always stored in
8 bytes. The greater the precision, the lower the fidelity.
|
Fixed Encoding
For certain datatypes, you can use a more compact representation of these
values. The options for these datatypes are listed in the following table.
Encoding |
Size (bytes) |
Notes |
TIMESTAMP ENCODING FIXED(32) |
4 |
Range: 1901-12-13 20:45:53 - 2038-01-19 03:14:07 .
Can also be inserted and stored in human-readable format:
YYYY-MM-DD HH:MM:SS
YYYY-MM-DDTHH:MM:SS (The T is dropped when the field is populated.)
|
TIME ENCODING FIXED(32) |
4 |
Range: 00:00:00 - 23:59:59 |
DATE ENCODING FIXED(32) |
4 |
Default encoding, equivalent to DATE .
Range in years: +/-5,883,517 around epoch, maximum date January 1,
5885487 (approximately). Values range from -185542587187200 to 185542587100800.
Supported formats when using COPY FROM : mm/dd/yyyy ,
dd-mmm-yy , yyyy-mm-dd , dd/mmm/yyyy .
When you create a column using DATE ENCODING FIXED(32) OmniSci defines the
column as DATE ENCODING DAYS(16). |
DATE ENCODING FIXED(16) |
2 |
Range: -32,768 - 32,767
Range in years: +/-90 around epoch, April 14, 1880 -
September 9, 2059. Values range from -2831155200 to 2831068800. Supported formats when using COPY FROM : mm/dd/yyyy , dd-mmm-yy , yyyy-mm-dd , dd/mmm/yyyy . |
TEXT ENCODING DICT(16) |
2 |
Max cardinality 64K distinct string values. |
TEXT ENCODING DICT(8) |
1 |
Max cardinality 255 distinct string values |
INTEGER ENCODING FIXED(16) |
2 |
Same as SMALLINT |
INTEGER ENCODING FIXED(8) |
1 |
Range: -127 - 127 |
SMALLINT ENCODING FIXED(8) |
1 |
Range: -127 - 127 |
BIGINT ENCODING FIXED(32) |
4 |
Same as INTEGER |
BIGINT ENCODING FIXED(16) |
2 |
Same as SMALLINT |
BIGINT ENCODING FIXED(8) |
1 |
Range: -127 - 127 |
To use these fixed length fields, the range or of the data must fit into the
constraints as described.
For DATE types, you can use the terms FIXED and DAYS interchangeably — both are
synonymous for the DATE type in OmniSci.
These encodings are most effective on low-cardinality TEXT
fields, where you can achieve large savings of storage space and improved processing speed, and on TIMESTAMP
fields where the timestamps range between 1901-12-13 20:45:53 and 2038-01-19 03:14:07.
All encoding options are shown. Some of the INTEGER options overlap. For example, INTEGER ENCODINGFIXED(8) and SMALLINT ENCODINGFIXED(8) are essentially the same.
If a TEXT ENCODING field does not match the defined cardinality, OmniSci substitutes a NULL
value and logs the change.
Understanding your schema and the scope of potential values in each field helps you achieve significant savings by carefully applying these fixed encoding types.
Defining Array Datatypes
Define datatype arrays by appending square brackets, as shown in the arrayexamples
DDL sample.
CREATE TABLE arrayexamples (
tiny_int_array TINYINT[],
int_array INTEGER[],
big_int_array BIGINT[],
text_array TEXT[] ENCODING DICT(32), --OmniSci supports only DICT(32) TEXT arrays.
float_array FLOAT[],
double_array DOUBLE[],
decimal_array DECIMAL(18,6)[],
boolean_array BOOLEAN[],
date_array DATE[],
time_array TIME[],
timestamp_array TIMESTAMP[])
You can also define fixed-length arrays. For example:
CREATE TABLE arrayexamples (
float_array3 FLOAT[3],
date_array4 DATE[4]
Fixed length arrays require less storage space than variable length arrays.
Geospatial Data Types
Geospatial Data Types
Type |
Description |
Example |
POINT |
A point described by two coordinates. When the coordinates are longitude and latitude, OmniSci stores longitude first, and then latitude. |
POINT(0 0) |
LINESTRING |
A sequence of 2 or more points and the lines that connect them. |
LINESTRING(0 0,1 1,1 2) |
POLYGON |
A set of one or more rings (closed line strings), with the first representing the shape (external ring) and the rest representing holes in that shape (internal rings). |
POLYGON((0 0,4 0,4 4,0 4,0 0),(1 1, 2 1, 2 2, 1 2,1 1)) |
MULTIPOLYGON |
A set of one or more polygons. |
MULTIPOLYGON(((0 0,4 0,4 4,0 4,0 0),(1 1,2 1,2 2,1 2,1 1)), ((-1 -1,-1 -2,-2 -2,-2 -1,-1 -1))) |
See Using Geospatial Objects.
Example
The following demonstrates creation of a table with geospatial data types.
CREATE TABLE geo ( name TEXT ENCODING DICT(32),
p0 POINT,
p1 GEOMETRY(POINT),
p2 GEOMETRY(POINT, 4326),
p3 GEOMETRY(POINT, 4326) ENCODING NONE,
p4 GEOMETRY(POINT, 900913),
ls0 LINESTRING,
ls1 GEOMETRY(LINESTRING, 4326) ENCODING COMPRESSED(32),
ls2 GEOMETRY(LINESTRING, 4326) ENCODING NONE,
poly0 POLYGON,
poly1 GEOMETRY(POLYGON, 4326) ENCODING COMPRESSED(32),
mpoly0 GEOMETRY(MULTIPOLYGON, 4326)
);
In this example:
- p0, p1,
ls0, and poly0 are simple
(planar) geometries.
- p4 is point geometry with Web Mercator
longitude/latitude coordinates.
- p2, p3,
ls1, ls2,
poly1, and mpoly0 are geometries using WGS84 SRID=4326
longitude/latitude coordinates.
Storage
Geometry storage requirements are largely dependent on coordinate data.
Coordinates are normally stored as 8-byte doubles, two coordinates per point,
for all points that form a geometry. Each POINT geometry in the p1 column,
for example, requires 16 bytes.
Compression
WGS84 (SRID 4326) coordinates are compressed to 32 bits by default. This
sacrifices some precision but reduces storage requirements by half.
For example, columns p2,
ls1, poly1,
and mpoly0 in the table defined above are compressed.
Each geometry in the p2 column requires 8 bytes,
compared to 16 bytes for p0.
You can explicitly disable compression. WGS84 columns
p3,
ls2 are not
compressed and continue using doubles. Simple (planar) columns
p0,
p1,
ls0,
poly1 and non-4326
column
p4 are not compressed.
Shared Dictionaries
You can improve performance of string operations and optimize storage using
shared dictionaries. You can share dictionaries within a table or between
different tables in the same database. The table with which you want to share
dictionaries must exist when you create the table that references the
TEXT ENCODING DICT
field, and the column that you are referencing
in that table must also exist. The following small DDL shows the basic
structure:
CREATE TABLE text_shard (
i TEXT ENCODING DICT(32),
s TEXT ENCODING DICT(32),
SHARD KEY (i))
WITH (SHARD_COUNT = 2);
CREATE TABLE text_shard1 (
i TEXT,
s TEXT ENCODING DICT(32),
SHARD KEY (i),
SHARED DICTIONARY (i) REFERENCES text_shard(i))
WITH (SHARD_COUNT = 2);
In the table definition, make sure that referenced columns appear before the
referencing columns.
For example, this DDL is a portion of the schema for the flights
database. Because airports are both origin and destination locations, it makes
sense to reuse the same dictionaries for name, city, state,
and country values.
create table flights (
*
*
*
dest_name TEXT ENCODING DICT,
dest_city TEXT ENCODING DICT,
dest_state TEXT ENCODING DICT,
dest_country TEXT ENCODING DICT,
*
*
*
origin_name TEXT,
origin_city TEXT,
origin_state TEXT,
origin_country TEXT,
*
*
*
SHARED DICTIONARY (origin_name) REFERENCES flights(dest_name),
SHARED DICTIONARY (origin_city) REFERENCES flights(dest_city),
SHARED DICTIONARY (origin_state) REFERENCES flights(dest_state),
SHARED DICTIONARY (origin_country) REFERENCES flights(dest_country),
*
*
*
)
WITH(
*
*
*
)
To share a dictionary in a different existing table, replace the table name
in the REFERENCES
instruction. For example, if you have an existing
table called us_geography, you can share the dictionary by
following the pattern in the DDL fragment below.
create table flights (
*
*
*
SHARED DICTIONARY (origin_city) REFERENCES us_geography(city),
SHARED DICTIONARY (origin_state) REFERENCES us_geography(state),
SHARED DICTIONARY (origin_country) REFERENCES us_geography(country),
SHARED DICTIONARY (dest_city) REFERENCES us_geography(city),
SHARED DICTIONARY (dest_state) REFERENCES us_geography(state),
SHARED DICTIONARY (dest_country) REFERENCES us_geography(country),
*
*
*
)
WITH(
*
*
*
);
Note | The referencing column cannot
specify the encoding of the dictionary, because it uses the encoding from the
referenced column. |