FlyteFile#

class flytekit.types.file.FlyteFile(path: 'typing.Union[str, os.PathLike]', downloader: 'typing.Callable' = <function noop at 0x7f49b8808ea0>, remote_path: 'typing.Optional[typing.Union[os.PathLike, str, bool]]' = None)#
Parameters:
  • path (str | PathLike)

  • downloader (Callable)

  • remote_path (Optional[Union[os.PathLike, str, bool]])

classmethod from_source(source)#

Create a new FlyteFile object with the remote source set to the input

Parameters:

source (str | PathLike)

Return type:

FlyteFile

classmethod new_remote_file(name=None, alt=None)#

Create a new FlyteFile object with a remote path.

Parameters:
  • name (str | None) – If you want to specify a different name for the file, you can specify it here.

  • alt (str | None) – If you want to specify a different prefix head than the default one, you can specify it here.

Return type:

FlyteFile

open(mode, cache_type=None, cache_options=None)#

Returns a streaming File handle

@task
def copy_file(ff: FlyteFile) -> FlyteFile:
    new_file = FlyteFile.new_remote_file()
    with ff.open("rb", cache_type="readahead") as r:
        with new_file.open("wb") as w:
            w.write(r.read())
    return new_file
Parameters:
  • mode (str) – Open mode. For example: ‘r’, ‘w’, ‘rb’, ‘rt’, ‘wb’, etc.

  • cache_type (str, optional) – Specifies the cache type. Possible values are “blockcache”, “bytes”, “mmap”, “readahead”, “first”, or “background”. This is especially useful for large file reads. See https://filesystem-spec.readthedocs.io/en/latest/api.html#readbuffering.

  • cache_options (Dict[str, Any], optional) – A Dict corresponding to the parameters for the chosen cache_type. Refer to fsspec caching options above.

path: str | PathLike = None#

Since there is no native Python implementation of files and directories for the Flyte Blob type, (like how int exists for Flyte’s Integer type) we need to create one so that users can express that their tasks take in or return a file. There is pathlib.Path of course, (which is usable in Flytekit as a return value, though not a return type), but it made more sense to create a new type esp. since we can add on additional properties.

Files (and directories) differ from the primitive types like floats and string in that Flytekit typically uploads the contents of the files to the blob store connected with your Flyte installation. That is, the Python native literal that represents a file is typically just the path to the file on the local filesystem. However in Flyte, an instance of a file is represented by a Blob literal, with the uri field set to the location in the Flyte blob store (AWS/GCS etc.). Take a look at the data handling doc for a deeper discussion.

We decided to not support pathlib.Path as an input/output type because if you wanted the automatic upload/download behavior, you should just use the FlyteFile type. If you do not, then a str works just as well.

The prefix for where uploads go is set by the raw output data prefix setting, which should be set at registration time in the launch plan. See the option listed under flytectl register examples --help for more information. If not set in the launch plan, then your Union backend will specify a default. This default is itself configurable as well. Contact your Union platform administrators to change or ascertain the value.

In short, if a task returns "/path/to/file" and the task’s signature is set to return FlyteFile, then the contents of /path/to/file are uploaded.

You can also make it so that the upload does not happen. There are different types of task/workflow signatures. Keep in mind that in the backend, in Admin and in the blob store, there is only one type that represents files, the Blob type.

Whether the uploading happens or not, the behavior of the translation between Python native values and Flyte literal values depends on a few attributes:

  • The declared Python type in the signature. These can be * python:flytekit.FlyteFile * python:os.PathLike Note that os.PathLike is only a type in Python, you can’t instantiate it.

  • The type of the Python native value we’re returning. These can be * flytekit.FlyteFile * pathlib.Path * str

  • Whether the value being converted is a “remote” path or not. For instance, if a task returns a value of “http://www.google.com” as a FlyteFile, obviously it doesn’t make sense for us to try to upload that to the Flyte blob store. So no remote paths are uploaded. Flytekit considers a path remote if it starts with s3://, gs://, http(s)://, or even file://.

Converting from a Flyte literal value to a Python instance of FlyteFile

Expected Python type

Type of Flyte IDL Literal

FlyteFile

os.PathLike

Blob

uri matches http(s)/s3/gs

FlyteFile object stores the original string path, but points to a local file instead.

  • [fn] downloader: function that writes to path when open’ed.

  • [fn] download: will trigger download

  • path: randomly generated local path that will not exist until downloaded

  • remote_path: None

  • remote_source: original http/s3/gs path

Basically this signals Flyte should stay out of the way. You still get a FlyteFile object (which implements the os.PathLike interface)

  • [fn] downloader: noop function, even if it’s http/s3/gs

  • [fn] download: raises exception

  • path: just the given path

  • remote_path: None

  • remote_source: None

uri matches /local/path

FlyteFile object just wraps the string

  • [fn] downloader: noop function

  • [fn] download: raises exception

  • path: just the given path

  • remote_path: None

  • remote_source: None

Converting from a Python value (FlyteFile, str, or pathlib.Path) to a Flyte literal

Expected Python type

Type of Python value

FlyteFile

os.PathLike

str or pathlib.Path

path matches http(s)/s3/gs

Blob object is returned with uri set to the given path. No uploading happens.

path matches /local/path

Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. Blob object is returned with uri pointing to the blob store location.

No warning is logged since only a string is given (as opposed to a FlyteFile). Blob object is returned with uri set to just the given path. No uploading happens.

FlyteFile

path matches http(s)/s3/gs

Blob object is returned with uri set to the given path. Nothing is uploaded.

path matches /local/path

Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. If remote_path is given, then that is used instead of the random path. Blob object is returned with uri pointing to the blob store location.

Warning is logged since you’re passing a more complex object (a FlyteFile) and expecting a simpler interface (os.PathLike). Blob object is returned with uri set to just the given path. No uploading happens.

Since Flyte file types have a string embedded in it as part of the type, you can add a format by specifying a string after the class like so.

def t2() -> flytekit_typing.FlyteFile["csv"]:
    return "/tmp/local_file.csv"
property remote_source: str#

If this is an input to a task, and the original path is an s3 bucket, Flytekit downloads the file for the user. In case the user wants access to the original path, it will be here.

flytekit.types.file.HDF5EncodedFile#

This can be used to denote that the returned file is of type hdf5 and can be received by other tasks that accept an hdf5 format. This is usually useful for serializing Tensorflow models

flytekit.types.file.HTMLPage#

Can be used to receive or return an HTMLPage. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.

flytekit.types.file.JoblibSerializedFile#

This File represents a file that was serialized using joblib.dump method can be loaded back using joblib.load.

flytekit.types.file.JPEGImageFile#

Can be used to receive or return an JPEGImage. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.

flytekit.types.file.PDFFile#

Can be used to receive or return an PDFFile. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.

flytekit.types.file.PNGImageFile#

Can be used to receive or return an PNGImage. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.

flytekit.types.file.PythonPickledFile#

This type can be used when a serialized Python pickled object is returned and shared between tasks. This only adds metadata to the file in Flyte, but does not really carry any object information.

flytekit.types.file.PythonNotebook#

This type is used to identify a Python notebook file.

flytekit.types.file.SVGImageFile#

Can be used to receive or return an SVGImage. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.