Used for multiple purposes:

  • Can be used to set metadata on a the extracted DataFrame. Note this will overwrite the existing metadata if it exists.

  • Can be used to specify a schema in case of no input files. This stage will create an empty DataFrame with this schema so any downstream logic that depends on the columns in this dataset, e.g. SQLTransform, is still able to run. This feature can be used to allow deployment of business logic that depends on a dataset which has not been enabled by an upstream sending system.