StructuredDataset type#

class flytekit.types.structured.StructuredDataset(dataframe=None, uri=None, metadata=None, **kwargs)#

This is the user facing StructuredDataset class. Please don’t confuse it with the literals.StructuredDataset class (that is just a model, a Python class representation of the protobuf).

Parameters:
  • dataframe (Optional[Any])

  • uri (str | None)

  • metadata (Optional[literals.StructuredDatasetMetadata])

class flytekit.types.structured.StructuredDatasetEncoder(python_type, protocol=None, supported_format=None)#
Parameters:
  • python_type (Type[T])

  • protocol (Optional[str])

  • supported_format (Optional[str])

abstract encode(ctx, structured_dataset, structured_dataset_type)#

Even if the user code returns a plain dataframe instance, the dataset transformer engine will wrap the incoming dataframe with defaults set for that dataframe type. This simplifies this function’s interface as a lot of data that could be specified by the user using the # TODO: Do we need to add a flag to indicate if it was wrapped by the transformer or by the user?

Parameters:
  • ctx (FlyteContext)

  • structured_dataset (StructuredDataset) – This is a StructuredDataset wrapper object. See more info above.

  • structured_dataset_type (StructuredDatasetType) – This the StructuredDatasetType, as found in the LiteralType of the interface of the task that invoked this encoding call. It is passed along to encoders so that authors of encoders can include it in the returned literals.StructuredDataset. See the IDL for more information on why this literal in particular carries the type information along with it. If the encoder doesn’t supply it, it will also be filled in after the encoder runs by the transformer engine.

Returns:

This function should return a StructuredDataset literal object. Do not confuse this with the StructuredDataset wrapper class used as input to this function - that is the user facing Python class. This function needs to return the IDL StructuredDataset.

Return type:

StructuredDataset

class flytekit.types.structured.StructuredDatasetDecoder(python_type, protocol=None, supported_format=None, additional_protocols=None)#
Parameters:
  • python_type (Type[DF])

  • protocol (Optional[str])

  • supported_format (Optional[str])

  • additional_protocols (Optional[List[str]])

abstract decode(ctx, flyte_value, current_task_metadata)#

This is code that will be called by the dataset transformer engine to ultimately translate from a Flyte Literal value into a Python instance.

Parameters:
  • ctx (FlyteContext) – A FlyteContext, useful in accessing the filesystem and other attributes

  • flyte_value (StructuredDataset) – This will be a Flyte IDL StructuredDataset Literal - do not confuse this with the StructuredDataset class defined also in this module.

  • current_task_metadata (StructuredDatasetMetadata) – Metadata object containing the type (and columns if any) for the currently executing task. This type may have more or less information than the type information bundled inside the incoming flyte_value.

Returns:

This function can either return an instance of the dataframe that this decoder handles, or an iterator of those dataframes.

Return type:

DF | Iterator[DF]