Getting Started with OmniSci Open Source
OmniSciDB is an in-memory, column store, SQL-based, relational database designed from the ground up to run on GPUs. Developers are encouraged to contribute to this Open Source project to expand and enhance OmniSciDB capabilities.
Note | OmniSci is the rebranded name of MapD. |
Licensing
OmniSci Open Source is licensed under the Apache License, Version 2.0.
Third-party Licenses
The repository includes a number of third party packages provided under separate licenses. Details about these packages and their respective licenses are at ThirdParty/licenses/index.md.
Cloning
You can clone a copy of OmniSciDB from GitHub ( https://github.com/omnisci/omniscidb).
Installing
Click the link for instructions on how to use a script to install OmniSciDB dependencies for your chosen platform.
- Installing CentOS 7 Dependencies
- Installing macOS Dependencies
- Installing Ubuntu Dependencies
- Installing Dependencies from Arch User Repository
This is the complete list of OmniSciDB dependencies.
Important | This list describes the required dependencies, but you do not have to install these dependencies individually. Follow the links above for instructions on how to configure OmniSciDB dependencies for your chosen platform. |
Package | Min Version | Required |
---|---|---|
CMake | 3.3 | yes |
LLVM | 3.8-4.0, 6.0 | yes |
GCC | 5.1 | no, if building with clang |
Go | 1.6 | yes |
Boost | 1.65.0 | yes |
OpenJDK | 1.7 | yes |
CUDA | 8.0 | yes, if compiling with GPU support |
gperftools | yes | |
gdal | 2.3 | yes |
Arrow | 0.11.0 | yes |
Dependencies for omnisci_web_server
and other Go utilities are
in the ThirdParty/go directory. See
ThirdParty/go/src/mapd/vendor/README.md.
Installing CentOS 7 Dependencies
OmniSciDB requires a number of dependencies that are not provided in the common CentOS/RHEL package repositories. A prebuilt package containing all of these dependencies is provided for CentOS 7 (x86_64).
Use the scripts/mapd-deps-prebuilt.sh build script to install prebuilt dependencies.
These dependencies are installed in a directory under /usr/local/mapd-deps.
The mapd-deps-prebuilt.sh
script also installs Environment Modules in order to
simplify managing the required environment variables. Log out and log back in
after running the mapd-deps-prebuilt.sh
script in order to active
Environment Modules command, module
.
The mapd-deps environment module is disabled by default. To activate for your current session, run the following command:
module load mapd-deps
To disable the mapd-deps module, run the following command:
module unload mapd-deps
WARNING | The mapd-deps package contains newer versions of packages such as GCC and ncurses that might not be compatible with the rest of your environment. Disable the mapd-deps module before compiling other packages. |
CUDA
It is best to install CUDA and the NVIDIA drivers using the .rpm, using the instructions provided by NVIDIA. The preferred .rpm (network) method ensures you always have the latest stable drivers, while the .rpm (local) method allows you to install without Internet access.
The .rpm method requires DKMS to be installed, which is available from the Extra Packages for Enterprise Linux (EPEL) repository:
sudo yum install epel-release
Reboot after installing to activate the NVIDIA drivers.
Environment Variables
The mapd-deps-prebuilt.sh script includes two files with the appropriate environment variables:
- mapd-deps-<date>.sh (for sourcing from your shell config)
- mapd-deps-<date>.modulefile (for use with Environment Modules, yum package environment-modules).
These files can be found in the mapd-deps install directory, usually /usr/local/mapd-deps/<date>. Either of these can be used to configure your environment: the .sh can be sourced in your shell config, while the .modulefile must be moved to the modulespath.
Building Dependencies
Use the scripts/mapd-deps-centos.sh script to build the dependencies. Modify this script and run it if you want to change dependency versions or to build on alternative CPU architectures.
cd scripts module unload mapd-deps ./mapd-deps-centos.sh --compress
This completes installation of dependencies. Go to the Building section to continue.
macOS
The shell script scripts/mapd-deps-osx.sh automatically installs and/or updates Homebrew, then uses it to install all dependencies. macOS must be completely up to date and Xcode must be installed before you run the script. You can find and install Xcode on the Apple App Store.
CUDA
mapd-deps-osx.sh automatically installs CUDA via Homebrew and adds the correct environment variables to your ~/.bash_profile.
Java
scripts/mapd-deps-osx.sh automatically installs Java and Maven via Homebrew and adds the correct environment variables to your ~/.bash_profile.
This completes installation of dependencies. Go to the Building section to continue.
Installing Ubuntu Dependencies
Most build dependencies required by OmniSciDB are available via APT. Certain dependencies such as Thrift, Blosc, and Folly must be built as they either do not exist in the default repositories or have outdated versions. A prebuilt package containing all these dependencies is provided for Ubuntu 18.04 (x86_64). The dependencies are installed to /usr/local/mapd-deps/ by default; see Environment Variables for how to add these dependencies to your environment.
Ubuntu 16.04
OmniSciDB requires a newer version of Boost than the version provided with Ubuntu 16.04. The scripts/mapd-deps-ubuntu1604.sh build script compiles and installs a newer version of Boost into the /usr/local/mapd-deps/ directory.
Ubuntu 18.04
Use the scripts/mapd-deps-prebuilt.sh build script to install prebuilt dependencies. These dependencies are installed to a directory under /usr/local/mapd-deps. The mapd-deps-prebuilt.sh script generates a script named mapd-deps.sh containing the required environment variables. Source this file in your current session (or symlink it to /etc/profile.d/mapd-deps.sh) to activate it:
source /usr/local/mapd-deps/mapd-deps.sh
Some installs of Ubuntu 18.04 might fail while building with a message similar to:
java.security.InvalidAlgorithmParameterException: the trustAnchors parameter must be non-empty
This is a known issue in 18.04 that should be resolved in Ubuntu 18.04.1. To resolve on 18.04:
sudo rm /etc/ssl/certs/java/cacerts sudo update-ca-certificates -f
Environment Variables
Add the CUDA and mapd-deps lib directories to LD_LIBRARY_PATH; add the CUDA and mapd-deps bin directories to PATH. The mapd-deps-ubuntu.sh and mapd-deps-prebuilt.sh scripts generate a script named mapd-deps.sh containing the environment variables that need to be set. Source this file in your current session (or symlink it to /etc/profile.d/mapd-deps.sh) to activate it:source /usr/local/mapd-deps/mapd-deps.sh
CUDA
Recent versions of Ubuntu provide the NVIDIA CUDA Toolkit and drivers in the standard repositories. Use the following command to install:
sudo apt install -y nvidia-cuda-toolkit
Reboot your system after install to activate the NVIDIA drivers.
Building Dependencies
The scripts/mapd-deps-ubuntu.sh and scripts/mapd-deps-ubuntu1604.sh scripts build the dependencies for Ubuntu 18.04 and 16.04, respectively. The scripts install all required dependencies (except CUDA) and build the dependencies that require it. Modify this script and run it if you want to change dependency versions or to build on alternative CPU architectures.
cd scripts ./mapd-deps-ubuntu.sh --compress
This completes installation of dependencies. Go to the Building section to continue.
Installing Dependencies from Arch User Repository
The following uses yaourt
to install packages from the
Arch User Repository.
yaourt -S \ git \ cmake \ boost \ google-glog \ extra/jdk8-openjdk \ clang \ llvm \ thrift \ go \ gdal \ maven VERS=1.21-45 wget --continue https://github.com/jarro2783/bisonpp/archive/$VERS.tar.gz tar xvf $VERS.tar.gz pushd bisonpp-$VERS ./configure make -j $(nproc) sudo make install popd
CUDA
You can install CUDA and the NVIDIA drivers with the following commands:
yaourt -S \ linux-headers \ cuda \ nvidia
Reboot after install to activate the NVIDIA drivers.
Environment Variables
You must add the CUDA bin directories to your PATH variable. A convenient way to do so is to create a new file, /etc/profile.d/mapd-deps.sh that contains the following:
PATH=/opt/cuda/bin:$PATH export PATH
This completes installation of dependencies. Go to the Building section to continue.
Building
OmniSci uses CMake as its build system. The following commands create and navigate to a new build directory, set the build type to debug, then build the OmniSci application using up to four simultaneous jobs.
cd ~/omniscidb // Navigate to the root directory of your OmniSciDB clone. mkdir build cd build cmake -DCMAKE_BUILD_TYPE=debug .. make -j 4
You can use the following optional configuration parameters to enable or disable features in your build.
-DCMAKE_BUILD_TYPE=release |
Build type and compiler options to use. Options are debug, release, RelWithDebInfo, MinSizeRel, and unset. | -DENABLE_ASAN=off |
Enable address sanitizer. Default is off. | -DENABLE_AWS_S3=on |
Enable AWS S3 support, if available. Default is on. | -DENABLE_CALCITE_DELETE_PATH=on |
Enable Calcite Delete Path. Default is on. | -DENABLE_CALCITE_UPDATE_PATH=on |
Enable Calcite Update Path. Default is on. | -DENABLE_CUDA=off |
Disable CUDA. Default is on. | -DENABLE_CUDA_KERNEL_DEBUG=off |
Enable debugging symbols for CUDA kernels. Will dramatically reduce kernel performance. Default is off. | -DENABLE_DECODERS_BOUNDS_CHECKING=off |
Enable bounds checking for column decoding. Default is off. | -DENABLE_FOLLY=on |
Use Folly. Default is on. | -DENABLE_IWYU=off |
Enable include-what-you-use. Default is off. | -DENABLE_JIT_DEBUG=off |
Enable debugging symbols for the JIT. Default is off. | -DENABLE_PROFILER=off |
Enable google perftools. Default is off. | -DENABLE_STANDALONE_CALCITE=off |
Require standalone Calcite server. Default is off. | -DENABLE_TESTS=on |
Build unit tests. Default is on. | -DENABLE_TSAN=off |
Enable thread sanitizer. Default is off. | -DENABLE_CODE_COVERAGE=off |
Enable code coverage symbols (clang only). Default is off. | -DENABLE_JAVA_REMOTE_DEBUG=on |
Enable Java Remote Debug. Default is off. | -DMAPD_DOCS_DOWNLOAD=on |
Download the latest master build of the documentation / omnisci.com/docs. Default is off. Note: this is a >50MB download. | -DPREFER_STATIC_LIBS=off |
Static link dependencies, if available. Default is off. |
Testing
OmniSciDB uses Google Test as its main testing framework. Tests reside under the Tests directory.
The sanity_tests target runs the most common tests. If using Makefiles to build, you can run the tests using the following command:
make sanity_tests
AddressSanitizer
You can activate AddressSanitizer
by setting the ENABLE_ASAN CMake
flag in a fresh build directory. At this time, CUDA must also be disabled. In an
empty build directory run CMake and compile:
mkdir build && cd build cmake -DENABLE_ASAN=on -DENABLE_CUDA=off .. make -j 4
Now you can run the tests:
export ASAN_OPTIONS=alloc_dealloc_mismatch=0:handle_segv=0 make sanity_tests
ThreadSanitizer
You can activate ThreadSanitizer
by setting the ENABLE_TSAN
CMake flag in a fresh build directory. At this time, you must also disable CUDA.
In an empty build directory, run CMake and compile:
mkdir build && cd build cmake -DENABLE_TSAN=on -DENABLE_CUDA=off .. make -j 4
OmniSci uses a TSAN suppressions file to ignore warnings in third party libraries. Source the suppressions file by adding it to your TSAN_OPTIONS env:
export TSAN_OPTIONS="suppressions=/path/to/omnisci/config/tsan.suppressions"
Now you can run the tests:
make sanity_tests
Packaging
OmniSciDB uses CPack to generate packages for distribution. Packages generated on CentOS with static linking enabled can be used on most other recent Linux distributions. To generate packages on CentOS (assuming starting from top level of the omniscidb repository), execute the following commands:
mkdir build-package && cd build-package cmake -DPREFER_STATIC_LIBS=on -DCMAKE_BUILD_TYPE=release .. make -j 4 cpack -G TGZ
The first command creates a fresh build directory, to ensure there is nothing left over from a previous build.
The second command configures the build to prefer linking to the dependencies' static libraries instead of the (default) shared libraries, and to build using CMake's release configuration (enables compiler optimizations). Linking to the static versions of the libraries reduces the number of dependencies you must install on target systems.
The last command generates a .tar.gz package. You can replace TGZ with, for example, RPM or DEB to generate a .rpm or .deb, respectively.
Using
You can use the startomnisci
wrapper script to start OmniSciDB
in a testing environment. The script performs the following tasks:
- Initializes the data storage directory via initdb, if required.
- Starts the main OmniSciDB server, omnisci_server.
- Offers to download and import a sample dataset, using the
insert_sample_data
script.
If you are in the build directory, and it is a subdirectory of the omniscidb
repository, you can run startomnisci
with the following command:
../startomnisci
Starting Manually
If you prefer to use the startup commands individually, run the following commands from the build directory.
Initialize the data storage directory. You only run this command one time.
mkdir data && ./bin/initdb data
Start the OmniSciDB server:
./bin/omnisci_server
Optionally, insert a sample dataset by running the
insert_sample_data
script in a new terminal:
../insert_sample_data
You can now start using the database. You can use the omnisql
utility to interact with the database from the command line:
./bin/omnisql -p HyperInteractivewhere HyperInteractive is the default password. The default user (omnisci) is used if you do not provide a user name. See omnisql.
Coding
Your contributed code should compile without generating warnings by recent compilers on most Linux distributions. Changes to the code must follow the C++ Core Guidelines.
clang-format
A .clang-format style configuration, based on the Chromium style guide, is provided at the top level of the repository. Format your code using a recent version (6.0+ preferred) of ClangFormat before you submit your changes.
To use clang-format
:
clang-format -i File.cpp
clang-tidy
A .clang-tidy
configuration is provided at the top level of the repository. Lint your
code using a recent version (6.0+ preferred) of clang-tidy
before
you submit your changes.
clang-tidy
requires all generated files to exist before you
run the utility. The easiest way to accomplish this is to run a full build
before you run clang-tidy
. OmniSci provides a build target that
runs clang-tidy
.
To use clang-tidy
:
make clang-tidy
Note |
|
Contributing
OmniSci welcomes and encourages your contributions to our Open Source project. You can read the roadmap for ideas on where you might contribute, and review the list of known issues and enhancements you can implement immediately.
Contributor License Agreement
To verify the intellectual property license granted with contributions from any person or entity, OmniSci must have a Contributor License Agreement (CLA) on file signed by each contributor. After you make a pull request, a bot will notify you if a signed CLA is required and provide instructions for how to sign it. Read the agreement carefully before signing and keep a copy for your records.