This documentation describes an old release, version 3.3.5. Documentation for the latest release, 3.6.1, can be found here.

JSON

Zuar Runner’s JSON plugin pipes data from any JSON file into a relational database.

../_images/json__connection.png

Example use case

Let’s take a simple JSON file with nested data and convert the data into tables in a relational database.

Example JSON file

Here’s what the source JSON file (people_pets.json) looks like:

[
  {
    "id": 1,
    "name": "Justin",
    "pets": [
      {
        "id": 1,
        "name": "Bear",
        "pet": {
          "type": "dog",
          "breed": "Goldendoodle"
        }
      },
      {
        "id": 2,
        "name": "Birdie",
        "pet": {
          "type": "dog",
          "breed": "Goldendoodle"
        }
      }
    ]
  },
  {
    "id": 2,
    "name": "Matt",
    "pets": []
  },
  {
    "id": 3,
    "name": "Ben",
    "pets": [
      {
        "id": 3,
        "name": "Zuca",
        "pet": {
          "type": "dog",
          "breed": "Cavapoo"
        }
      }
    ]
  }
]

Expected Relational Database Results

Because the JSON file has nested data in the pets array ([ ]), we will need to create two tables in the database.

  • Table 1 will include top level key/value pairs: id, name. The resulting table should have 3 rows.

  • Table 2 will include data in the pets array (id, name, and pet object’s data). Table 2 will also include data from the parent object (id) so we can join the two database tables. The resulting table should have 3 rows.

Table 1 expected result:

id

name

1

Justin

2

Matt

3

Ben

Table 2 expected result:

id

name

pet__type

pet__breed

person_id

1

Bear

dog

Goldendoodle

1

2

Birdie

dog

Goldendoodle

1

3

Zuca

dog

Cavapoo

3

Create a Zuar Runner JSON Job

The JSON file(s) must exist in Zuar Runner’s file system. Use the file manager to manually add a JSON file to Zuar Runner.

The JSON input requires creating a job by hand, so create a job using the Generic plugin. Set the job’s type to io.

Example Zuar Runner Job Configurations

Table 1 Zuar Runner job configuration:

{
    "input": {
        "use": "flatfile.iov2#JsonInput",
        "source": "/var/mitto/data/people_pets.json"
    },
    "output": {
        "tablename": "people",
        "use": "call:mitto.iov2.db#todb",
        "schema": "json",
        "dbo": "postgresql://db/analytics"
    },
    "steps": [
        {
            "use": "mitto.iov2.steps#Input",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "use": "mitto.iov2.steps#Output",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CollectMeta"
        }
    ]
}

This example job take the source JSON file (people_pets.json) and creates a database table (json.people) in Zuar Runner’s internal PostgreSQL database (localhost). It creates columns from all the “top level” keys excluding the arrays ([ ]).

Table 2 Zuar Runner job configuration:

{
    "input": {
        "source": "/var/mitto/data/people_pets.json",
        "use": "flatfile.iov2#JsonInput"
    },
    "output": {
        "dbo": "postgresql://db/analytics",
        "schema": "json",
        "tablename": "people__pets",
        "use": "call:mitto.iov2.db#todb"
    },
    "steps": [
        {
            "use": "mitto.iov2.steps#Input",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#PluckV2Transform",
                    "jpath": "$.pets[*]",
                    "members": [
                        {
                            "name": "person_id",
                            "value": "$.id"
                        }
                    ]
                },
                {
                    "use": "mitto.iov2.transform#ExtraColumnsTransform"
                },
                {
                    "use": "mitto.iov2.transform#ColumnsTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CreateTable"
        },
        {
            "use": "mitto.iov2.steps#Output",
            "transforms": [
                {
                    "use": "mitto.iov2.transform#FlattenTransform"
                }
            ]
        },
        {
            "use": "mitto.iov2.steps#CollectMeta"
        }
    ]
}

This example job take the source JSON file (people_pets.json) and creates a database table (json.people__pets) in Zuar Runner’s internal PostgreSQL database (localhost). Based on the jpath and members parameters in the steps, it creates columns from all the keys inside the pets array and the “top level” id key.

Jpath

The jpath parameter let’s you define exactly what section of the source JSON file to pull from. This is necessary for Table 2 because of the nested data in the pets array. Learn more about JSONPath expressions below.

Members

The members parameter let’s you add additional columns from the source JSON file to the resulting database table. In our case, we need to add the id of each person to the people__pets table in order for us to join the two tables together. members also uses jpath in it’s value key, and you can name the resulting column by adjusting the name key.

JSONPath Expressions

To understand how to use JSONPath to pick specific sections of data out of a JSON object you can visit: https://goessner.net/articles/JsonPath/

To interactively learn how to use JSONPath syntax you can visit: https://jsonpath.com/