pymapd

The pymapd client interface provides a python DB API 2.0-compliant MapD interface. In addition, it provides methods to get results in the Apache Arrow-based GDF format for efficient data interchange.

Documentation

See the GitHub pymapd repository for full documentation:

Examples

Install Miniconda

Follow these instructions to prepare, install and configure Miniconda.

  1. Preparation
    1. Optionally pip uninstall any versions of pymapd and pygdf.
    2. Optionally delete your ~/miniconda directory.
  2. Install and configure miniconda.
    wget https://repo.continuum.io/miniconda/Miniconda2-latest-Linux-x86_64.sh
    bash Miniconda2-latest-Linux-x86_64.sh
    source ~/.bash_profile
    conda create -n myenv3 python=3
    source activate myenv3
    conda install -c conda-forge -c gpuopenanalytics/label/dev pygdf
    conda install -c conda-forge -c gpuopenanalytics/label/dev pymapd
    conda install cudatoolkit
    

Create a Cursor and Execute a Query

Create a connection.

>>> from pymapd import connect
>>> con = connect(user="mapd", password= "HyperInteractive", host="my.host.com", dbname="mapd")
>>> con
Connection(mapd://mapd:***@my.host.com:9091/mapd?protocol=binary)

Create a cursor.

>>> c = con.cursor()
>>> c
<pymapd.cursor.Cursor object at 0x7f0117fe2490>

Query database table of flight departure and arrival delay times.

>>> c.execute("SELECT depdelay, arrdelay FROM flights LIMIT 100")
<pymapd.cursor.Cursor object at 0x7f0117fe2490>

Display number of rows returned.

>>> c.rowcount
100

Display the Description objects list.

The list is a named tuple with attributes required by the specification. There is one entry per returned column, and we fill the name, type_code, and null_ok attributes.

>>> c.description
[Description(name=u'depdelay', type_code=0, display_size=None, internal_size=None, precision=None, scale=None, null_ok=True), Description(name=u'arrdelay', type_code=0, display_size=None, internal_size=None, precision=None, scale=None, null_ok=True)]

Iterate over the cursor, returning a list of tuples of values.

>>> result = list(c)
>>> result[:5]
[(1, 14), (2, 4), (5, 22), (-1, 8), (-1, -2)]

Select Data into a GpuDataFrame Provided by pygdf

This example shows a pygdf query on a local MapD instance.

(myenv3) me@there:~$ python
Python 3.6.2 |Anaconda, Inc.| (default, Oct  5 2017, 07:59:26) 
[GCC 7.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.

Connect to the MapD server.

>>> from pymapd import connect
>>> con = connect(user="mapd", password= "HyperInteractive", host="localhost", dbname="mapd")

Execute a SQL query.

>>> con.execute("SELECT depdelay, arrdelay FROM flights_2008_10k LIMIT 100")

Use the cursor object to examine the results.

>>> c = con.cursor()
>>> c.execute("SELECT depdelay, arrdelay FROM flights_2008_10k LIMIT 100")
<pymapd.cursor.Cursor object at 0x7f2453ea77b8>
>>> c.rowcount
100
>>> import pygdf as pg
>>> df = con.select_ipc_gpu("SELECT depdelay, arrdelay FROM flights_2008_10k LIMIT 100")
>>> df.head()
  depdelay arrdelay
0       -2       -7
1        0      -16
2       21        2
3       27       19
4        0       -3
>>>
Note pygdf select_ipc_ calls are for sharing the same chunk of memory and results between MapD and Python on the same server. If you need to move results over the network, use execute() calls instead.