This documentation describes a release under development. Documentation for the latest release, 3.6.2, can be found here.

SDL

SDL stands for “storage definition language” and is used synonymously with “DDL.” SDL plays a large part in how Mitto IO jobs process data and how Mitto learns and defines the structure of that data.

Mitto IO jobs take date from various sources (APIs, databases, and files). Fields from input sources don’t always have data types, however data types neeed to be defined when outputting data to relational databases so destination table structures can be created or updated. A valid data type is required when sending data to relational databases: SDL gives us a mechanism for specifying appropriate data types so the data can be properly stored in the database.

SDL is mostly handled by Mitto internally through specific IO job steps and transforms (e.g. ExtraColumnsTransform, ColumnsTransform, and ReflectTransform), however there are a few use cases where specifying SDL is useful:

  • Forcing specific data types and lengths when outputting to a relational database

  • Adding new columns to the data piped from an input

  • Changing a column name in the output table

  • Removing a column from the output

As data from an input is piped through Mitto, Mitto is learning the data and assigning data types to it. Learn more about how Mitto samples data.

In some situations, you may want to force a specific data type, possibly to adjust a data type format to a specific destination database. We can do this by using SDL statements in the Mitto Job Configuration. In the Mitto Job Configuration, sdl statments are added after the input and output directives. Since SDL in the Mitto Job applies generically to the data output, SDL would be applicable to any Mitto Job that stores data to a database.

{
  input: {...}
  output: {...}
    sdl: {
      columns: [
        {
          name: bounce_rate
          type: Float
        }
        {
          name: percent_new_sessions
          type: Float
        }
      ]
    }
    steps:  {...}

How to Force Data Types with SDL

This example will have Mitto force the data type of the id column to a String. String SDL statements can accept an optional Length argument, setting the maximum string length of the destination column. NOTE: this Length argument doesn’t truncate the column value for the record, it simply forces the column length in the table.

sdl: {
  columns: [
    {
      name: id
      type: String
      length: 15
    }
  ]
}

Similar to dealing with string length, for Numeric columns we can force the data type of column value_test to a decimal value with explicit precision and scale.

sdl: {
  columns: [
    {
      name: value_test
      type: Numeric
      precision: 20
      scale: 7
    }
  ]
}

Using Python __eval__ in SDL

We are able to embed very simplistic python statements into Mitto Job configurations by usin an __eval__ parameter. Single line Python expressions can be passed via the __eval__ key. See Python’s eval() documentation: https://docs.python.org/3/library/functions.html#eval

Typically, IO jobs should extract data from an input and load it as-is into the output with further data manipulations done with subsequent Mitto SQL jobs.

However, there are cases where you may wish to do some transformation on the input data as it’s being piped to the output, or to add data columns that don’t exist in the input data. This (and other use cases) can be accomplished by using an __eval__ statement in SDL. These python expressions act on the “per record” data in from the Mitto Input as it’s being processed to the output data format.

Note

The records will be refered to by data[ ] for the purpose of consistency here; in practice, the name of the record will be dictated by the Mitto Input used.

Create UTC date time column

This SDL example creates a new column in the output database table with the current UTC date and time:

sdl: {
  columns: [
    {
      name: mitto_update_at
      type: DateTime
      __eval__: column["__type__"].python_type.utcnow()
    }
  ]
}

Create a column with static value

This SDL example creates a new string column in the output with the value “Hello World”:

sdl: {
  columns: [
    {
      name: new_string_column
      type: String
      __eval__: str("Hello World")
    }
  ]
}

Store only the first N characters for a column

This SDL example will store only the first 63 characters of the input data and truncate the remaining column data:

sdl: {
  columns: [
    {
      name: mitto_update_at
      type: String
      __eval__: data["description"][:63]
    }
  ]
}

Store the merged values of two columns into a new column

This SDL example will combine the values from two columns into a new column with the name example_full_name.

sdl: {
  columns: [
    {
      name: example_full_name
      type: String
      __eval__: str(data["first_name"], " ", data["last_name"])
    }
  ]
}