Configuration Parameters

OmniSci has minimal configuration requirements with a number of additional configuration options. This topic describes the required and optional configuration changes you can use in your OmniSci instance.

Data Directory

Before starting the OmniSci server, you must initialize the persistent data directory. To do so, create an empty directory at the desired path, such as /var/lib/mapd. Create the environment variable $MAPD_STORAGE.

export MAPD_STORAGE=/var/lib/mapd

Change the owner of the directory to the user that the server will run as ($MAPD_USER):

sudo mkdir -p $MAPD_STORAGE
sudo chown -R $MAPD_USER $MAPD_STORAGE

Where $MAPD_USER is the system user account that the server runs as, such as mapd, and $MAPD_STORAGE is the path to the parent of the OmniSci server data directory.

Finally, run $MAPD_PATH/bin/initdb with the data directory path as the argument:

$MAPD_PATH/bin/initdb $MAPD_STORAGE

Configuration File

OmniSci supports storing options in a configuration file. This is useful if, for example, you need to run the OmniSci server and web server on ports different than the defaults.

If you store a copy of mapd.conf in the $MAPD_STORAGE directory, the configuration settings are picked up automatically by the sudo systemctl start mapd_server and sudo systemctl start mapd_web_server commands.

Set the flags in the configuration file using the format <flag> = <value>. Strings must be enclosed in quotes. The following is a sample configuration file. The entry for data path is a string and must be in quotes. The last entry in the first section, for null-div-by-zero, is the Boolean value true and does not require quotes.

port = 9091 
http-port = 9090
data = "/var/lib/mapd/data"
null-div-by-zero = true

[web]
port = 9092
frontend = "/opt/mapd/frontend"
servers-json = "/var/lib/mapd/servers.json"
enable-https = true

Command Line Parameters

You can make ad hoc changes to your configuration by specifying parameters on the command line at runtime. Append two hyphens to the parameter, followed by any required argument. For example, the following command starts the mapd_server using a temporary configuration file.

sudo systemctl start mapd_server --config ~/temp.conf

Configuration Parameters for OmniSci Server

Configuration Flags for OmniSci Server
Flag Description Implied Value Default Value Why Change It?
allow-cpu-retry [=arg]Allow the queries which failed on GPU to retry on CPU, even when watchdog is enabled.TRUE[1]FALSE [0] When watchdog is enabled most queries that run on GPU and throw a watchdog exception will fail. Turn this on to allow queries which fail the watchdog on GPU to retry on CPU. The default behavior is for queries that run out of memory on GPU to throw an error if watchdog is enabled (note that watchdog is enabled by default).
allow-loop-joins [=arg] Enable loop joinsTRUE[1]FALSE [0] This flag enables the loop join implementation, as opposed to the default hash join implementation. Queries loop over all rows from all tables involved in the join, and evaluate the join condition. Loop joins can be effective when you compare a large inner dataset to a small outer dataset. When both datasets are large, performance is predictably slower.

In most scenarios, hash-join (default) performance is superior to loop-join performance. There are two cases when you might use loop joins:
  1. When you cannot use a hash join. For example, the join condition does not exist, as in cross joins or in some geospatial cases where it is not possible to easily create the hash table on the GPU.
  2. When hash join performance is slow (likely because of highly skewed data distribution, making hash table probes expensive).
For best performance, avoid using loop joins unless your requirements match one of these two scenarios.
bigint-count [=arg]Use 64-bit countFALSE[0]FALSE[0]This setting is disable by default because 64-bit integer atomics are slow on GPUs. If you see negative values for a count, indicating overflow, enable this setting. If your data set has more than 4 billion records you will likely need to turn this on.
calcite-max-mem argMax memory available to calcite JVM1024Change if Calcite reports out of memory errors.
calcite-port arg Calcite port number9093Change to avoid collisions with ports already in use.
config argPath to mapd.conf$MAPD_STORAGEChange for testing and debugging.
cpuRun on CPU onlyFALSEOne use case for disabling GPUs would be during database conversion. That requires moving a large amount of data with minimal processing.
cpu-buffer-mem-bytes arg Size of memory reserved for CPU buffers [bytes]0Change to restrict the amount of CPU/system memory OmniSci Core can consume. A default value of 0 indicates no limit on CPU memory use (OmniSci Server uses all available CPU memory on the system).
cuda-block-size argSize of block to use on GPU0GPU Performance tuning: Number of threads per block. Note that a default of 0 means use all threads per block.
cuda-grid-size argSize of grid to use on GPU0GPU Performance tuning: Number of blocks per device. Note that a default of 0 means use all available blocks per device.
data argDirectory path to OmniSci catalogs$MAPD_STORAGEChange for testing and debugging.
db-query-list argPath to file containing OmniSci queriesN/AN/AUse a query list to autoload data to GPU memory on startup to speed performance.
dynamic-watchdog-time-limit [=arg]Dynamic watchdog time limit, in milliseconds10000100000Change if Dynamic Watchdog is stopping queries which are expected to take longer than this limit.
enable-access-priv-check [=arg]Check user access privileges to database objectsTRUE[1]TRUE[1]Disables the privileges model. Essentially the same as running with only superusers.
enable-debug-timer [=arg]Enable fine grained query execution timers for debug.TRUE[1]FALSE [0]For debugging, logs verbose timing information for query execution (time to load data, time to compile code, etc).
enable-dynamic-watchdog [=arg]Enable dynamic watchdogTRUE[1]FALSE [0]
enable-filter-push-down [=arg(=1)] (=0)Enable filter push down through joins.TRUE[1]FALSE[0]Evaluates filters in the query expression for selectivity and pushes down highly selective filter into the join according to selectivity parameters. See also What is Predicate Pushdown?
enable-overlaps-hashjoin [=arg(=1)] (=0)Enable the overlaps hash join framework allowing for range join (e.g. spatial overlaps) computation using a hash table.TRUE[1]FALSE[0]
enable-watchdog [arg]Enable watchdogTRUE[1]TRUE[1]
filter-push-down-low-fracHigher threshold for selectivity of filters which are pushed down.filter-push-down-low-frac = 0.1Filters with selectivity less than this threshold are considered for a push down.
filter-push-down-passing-row-uboundUpper bound on the number of rows that should pass the filter if the selectivity is less than the high fraction threshold.filter-push-down-passing-row-ubound = 4000000
flush-log [arg]Immediately flush logs to disk.TRUE[1]TRUE[1]Set to FALSE if this is a performance bottleneck.
from-table-reordering [=arg(=1)] (=1)Enable automatic table reordering in FROM clauseTRUE[1]TRUE[1]Automatic FROM clause table reordering re-orders the sequence of a join to place large tables on the inside of the join clause and smaller tables on the outside. OmniSci also reorders tables between join clauses to prefer hash joins over loop joins. You should not need to change this value except in special cases where OmniSci engineers are working directly with you to resolve an issue.
gpuRun on GPUs (Default)TRUEOne use case for disabling GPUs would be during database conversion. That requires moving a large amount of data with minimal processing.
gpu-buffer-mem-bytes argSize of memory reserved for GPU buffers [bytes] (per GPU)0Restricts the amount of memory a single process uses, so that when running multitenancy in the cloud several processes can all use the same GPUs.
hll-precision-bits [=arg]Number of bits used from the hash value used to specify the bucket number.1111Change to increase/decrease approx_count_distinct() precision. Increased precision decreases performance.
http-port argHTTP port number9090Change to avoid collisions with ports already in use.
idle-session-duration argMaximum duration of an idle session, in minutes.60Change to increase or decrease duration of an idle session before timeout.
inner-join-fragment-skipping [=arg(=1)] (=0)Enable/disable inner join fragment skipping.Enables skipping fragments for improved performance during inner join operations.
license argPath to file containing license keyChange if your provided license file is located in a different location or has a different name.
max-session-duration argMaximum duration of the active session, in minutes.30Change to increase or decrease session duration before timeout.
null-div-by-zero [=arg]Allows processing to complete when when the dataset would cause a div/0 error.0Set to TRUE if you prefer to return null when dividing by zero, FALSE to throw an exception.
num-gpus argNumber of gpus to use-1In a shared environment, you can assign the number of GPUs to a particular application. The default, -1, means use all available GPUs.
num-reader-threads argNumber of reader threads to use0Drop the number of reader threads to prevent imports from taking all available CPU power. Default is to use all threads.
overlaps-bucket-threshold arg (=0.10000000000000001)The minimum size of a bucket corresponding to a given inner table range for the overlaps hash join.0.10000000000000001
read-only [=arg]Enable read-only modeTRUE[1]FALSE[0]Prevents inadvertent (or purposeful) changes to the dataset.
render-mem-bytes argSize of memory reserved for rendering [bytes]500000000This allocation is performed at startup on each configured GPU. It is static and persists while the server is running unless you execute a \clear_gpu_memory command. Increase if rendering a large number of points/symbols and you have received out of memory exceptions that read "Not enough OpenGL memory to render the query results" during rendering. Default is 500 MB.
render-poly-cache-bytes argSize of memory reserved for polygon rendering [bytes]300000000The polygon render cache is used to improve polygon rendering performance from frame-to-frame when rendering the same query. Oftentimes more complex queries are used with polygon rendering, such as choropleths that use expensive joins and aggregates. In addition, the processing time required to build the polygon buffers for rendering can be relatively expensive. To mitigate poor performance with successive rebuilds of query results and polygon render buffers, the polygon cache can be used. This configuration flag limits the maximum size of the polygon render cache. In contrast to 'render-mem-bytes', there is no allocation performed at startup for this configuration flag, so if no polygon rendering is ever performed, no allocations will have been executed that count toward this limit. Polygon buffer allocations are performed dynamically when requested. If the query results and polygon buffer sizes exceed the limit of the cache, the render can still be executed (as long as there's enough GPU memory to do so), but you may see performance degredation from frame-to-frame. If you notice poor performance from frame-to-frame with polygon rendering, you may want to consider increasing this cache size. You can get hints from the INFO log as to what a more appropriate setting should be. For instance, if you see a log message such as this: "Cannot cache bytes ( for vbo/ibo) for poly query: on gpu . There is currently of total bytes used in the poly cache.", then you can extract the size in bytes to render a specific query and adjust this setting accordingly. Default is 300MB.
rendering [=arg]Enable/disable backend renderingTRUE[1]TRUE[1]Disable rendering when it is not in use. This frees up the memory set aside for rendering by the render-mem-bytes option. To re-enable rendering, you must restart mapd_server.
res-gpu-mem =argReserved memory for GPU, not use OmniSci allocator.134217728OmniSci is very greedy. We take all the memory on the GPU except for (Render-Mem-Bytes + Res-Gpu_Mem). We allocate for all of render-mem-bytes at startup. The res-gpu-mem allows you to reserve some extra memory for your system (for example, if your GPU is also driving your display, like on a laptop or single card desktop). This is also a useful flag if you have other processes sharing the GPU with OmniSci, such as a machine learning pipeline. In advanced rendering scenarios or distributed setups, increasing `res-gpu-mem` allows the system to grab additional memory for the renderer, or for aggregating results for the renderer from multiple leaf nodes.
start-gpu argFirst gpu to useFALSE[0]
trivial-loop-join-threshold [=arg]The maximum number of rows in the inner table of a loop join considered to be trivially small10001000

Enterprise Edition Additional Parameters

cluster argPath to data leaves list JSON file. Indicates that the OmniSci server instance is an aggregator node, and where to find the rest of its cluster. $MAPD_STORAGEChange for testing and debugging.
compression-limit-bytes [=arg(=536870912)] (=536870912)Compress result sets that are transfered between leaves.536870912536870912Minimum length of payload above which data is compressed.
compressor arg (=lz4hc)Compressor algorithm to be used by the server to compress data being transferred between server.lz4hclz4hcSee Data Compression for compression algorithm options.
ha-brokers argLocation of the HA brokers.Point to Kafka broker used for High Availability.
ha-group-id argId of the HA group this server is in.Change to match the group ID used for all servers in the OmniSci Core High Availability group.
ha-shared-path argDirectory path to shared OmniSci.directory.Required part of the High Availability OmniSci Core setup. Specifies the shared file storage that allows multiple OmniSci Core servers to function as a High Availability cluster.
ha-unique-server-id argUnique id to identify this server in the HA group.Change to assign unique ID to this server in the OmniSci High Availability group.
ldap-dn argldap DN Distinguished Name.(=uid=%s, cn=users, cn=accounts, dc=mapd, dc=com)
ldap-role-query-regex argRegEx to use to extract role from role query result.
ldap-role-query-url argldap query role URL.
ldap-superuser-role argThe role name to identify a superuser.
ldap-uri arg ldap server uri.
leaf-conn-timeout [=arg]Leaf connect timeout, in milliseconds.2000020000Increase/decrease to fail Thrift connections between OmniSci Core instances more or less quickly if a connection cannot be established.
leaf-recv-timeout [=arg]Leaf receive timeout, in milliseconds.300000300000Increase/decrease to fail Thrift connections between OmniSci Core instances more or less quickly if data is not received in the time allotted.
leaf-send-timeout [=arg]Leaf send timeout, in milliseconds.300000300000Increase/decrease to fail Thrift connections between OmniSci Core instances more or less quickly if data is not sent in the time allotted.
saml-metadata-file argPath to Identity provider metadata file.This is a required flag for running SAML. An Identity provider (like Okta) supplies a metadata file. From this file, OmniSci uses:
  1. Public key of Identity provider to verify that the SAML response comes from it and not from somebody else.
  2. URL of SSO login page which we use to obtain a SAML token.
saml-sp-target-url argURL of the service provider for which SAML assertions should be generated.This is a required flag for running SAML. It is used to verify that a SAML token was issued for OmniSci and not for some other service.
saml-sync-roles arg (=0)Enable mapping of SAML groups to MapD roles.saml-sync-roles [ = 0]The SAML Identity provider (for example, Okta) automatically creates users at login and assigns them roles they already have as groups in SAML.
string-servers argPath to string servers list JSON file.Indicates that OmniSci Core is running in distributed mode and required to designate a leaf server when running in distributed mode.

Configuration Parameters for OmniSci Web Server

Configuration Flags for OmniSci Web Server
Flag Description Default Why Change It?
-b | backend-url string URL to http-port on mapd_server http://localhost:9090 Change to avoid collisions with other services.
cert string Certificate file for HTTPS cert.pem Change for testing and debugging.
-c | config string Path to OmniSci configuration file   Change for testing and debugging.
-d | data string Path to OmniSci data directory data Change for testing and debugging.
db-query-list <path-to-query-list-file> Pre-load data to memory based on SQL queries stored in a list file. n/a Automatically run queries that load the most frequently used data to enhance performance. See Pre-loading Data.
docs string Path to documentation directory docs Change if you move your documentation files to another directory.
enable-https Enable HTTPS support   Change to enable secure HTTP.
-f | frontend string Path to frontend directory frontend Change if you move the location of your frontend UI files.
key string Key file for HTTPS key.pem Change for testing and debugging.
-p | port int Frontend server port 9092 Change to avoid collisions with other services.
-r | read-only Enable read-only mode   Prevent inadvertent (or nefarious) changes to the data.
servers-json string Path to servers.json   Change for testing and debugging.
timeout duration Maximum request duration in #h#m#s format. For example 0h30m0s represents a duration of 30 minutes. 1h0m0s The --timeout option controls the maximum duration of individual HTTP requests. This is used to manage resource exhaustion caused by improperly closed connections. One side effect is that it limits the execution time of queries made over the Thrift HTTP transport. This timeout duration must be increased if queries are expected to take longer than the default duration of one hour; for example, if you perform a COPY FROM on a large file when using mapdql with the HTTP transport.
tmpdir string Path for temporary file storage /tmp The temporary directory is used as a staging location for file uploads. You might want to place this directory on the same file system as the OmniSci data directory. If not specified on the command line, mapd_web_server also respects the standard TMPDIR environment variable as well as a specific MAPD_TMPDIR environment variable, the latter of which takes precedence. If you use neither the command-line argument nor one of the environment variables, the default, /tmp/ is used.
-v | verbose Print all log messages to stdout   Change for testing and debugging.
version Return version