FlyteFile#
- class flytekit.types.file.FlyteFile(path: 'typing.Union[str, os.PathLike]', downloader: 'typing.Callable' = <function noop at 0x7fe878052f20>, remote_path: 'typing.Optional[typing.Union[os.PathLike, str, bool]]' = None)#
- Parameters:
path (str | PathLike)
downloader (Callable)
remote_path (Optional[Union[os.PathLike, str, bool]])
- classmethod from_source(source)#
Create a new FlyteFile object with the remote source set to the input
- Parameters:
source (str | PathLike)
- Return type:
- classmethod new_remote_file(name=None, alt=None)#
Create a new FlyteFile object with a remote path.
- Parameters:
name (str | None) – If you want to specify a different name for the file, you can specify it here.
alt (str | None) – If you want to specify a different prefix head than the default one, you can specify it here.
- Return type:
- open(mode, cache_type=None, cache_options=None)#
Returns a streaming File handle
@task def copy_file(ff: FlyteFile) -> FlyteFile: new_file = FlyteFile.new_remote_file() with ff.open("rb", cache_type="readahead") as r: with new_file.open("wb") as w: w.write(r.read()) return new_file
- Parameters:
mode (str) – Open mode. For example: ‘r’, ‘w’, ‘rb’, ‘rt’, ‘wb’, etc.
cache_type (str, optional) – Specifies the cache type. Possible values are “blockcache”, “bytes”, “mmap”, “readahead”, “first”, or “background”. This is especially useful for large file reads. See https://filesystem-spec.readthedocs.io/en/latest/api.html#readbuffering.
cache_options (Dict[str, Any], optional) – A Dict corresponding to the parameters for the chosen cache_type. Refer to fsspec caching options above.
- path: str | PathLike = None#
Since there is no native Python implementation of files and directories for the Flyte Blob type, (like how int exists for Flyte’s Integer type) we need to create one so that users can express that their tasks take in or return a file. There is
pathlib.Path
of course, (which is usable in Flytekit as a return value, though not a return type), but it made more sense to create a new type esp. since we can add on additional properties.Files (and directories) differ from the primitive types like floats and string in that Flytekit typically uploads the contents of the files to the blob store connected with your Flyte installation. That is, the Python native literal that represents a file is typically just the path to the file on the local filesystem. However in Flyte, an instance of a file is represented by a
Blob
literal, with theuri
field set to the location in the Flyte blob store (AWS/GCS etc.). Take a look at the data handling doc for a deeper discussion.We decided to not support
pathlib.Path
as an input/output type because if you wanted the automatic upload/download behavior, you should just use theFlyteFile
type. If you do not, then astr
works just as well.The prefix for where uploads go is set by the raw output data prefix setting, which should be set at registration time in the launch plan. See the option listed under
flytectl register examples --help
for more information. If not set in the launch plan, then your Union backend will specify a default. This default is itself configurable as well. Contact your Union platform administrators to change or ascertain the value.In short, if a task returns
"/path/to/file"
and the task’s signature is set to returnFlyteFile
, then the contents of/path/to/file
are uploaded.You can also make it so that the upload does not happen. There are different types of task/workflow signatures. Keep in mind that in the backend, in Admin and in the blob store, there is only one type that represents files, the
Blob
type.Whether the uploading happens or not, the behavior of the translation between Python native values and Flyte literal values depends on a few attributes:
The declared Python type in the signature. These can be *
python:flytekit.FlyteFile
*python:os.PathLike
Note thatos.PathLike
is only a type in Python, you can’t instantiate it.The type of the Python native value we’re returning. These can be *
flytekit.FlyteFile
*pathlib.Path
*str
Whether the value being converted is a “remote” path or not. For instance, if a task returns a value of “http://www.google.com” as a
FlyteFile
, obviously it doesn’t make sense for us to try to upload that to the Flyte blob store. So no remote paths are uploaded. Flytekit considers a path remote if it starts withs3://
,gs://
,http(s)://
, or evenfile://
.
Converting from a Flyte literal value to a Python instance of FlyteFile
Expected Python type
Type of Flyte IDL Literal
FlyteFile
os.PathLike
Blob
uri matches http(s)/s3/gs
FlyteFile object stores the original string path, but points to a local file instead.
[fn] downloader: function that writes to path when open’ed.
[fn] download: will trigger download
path: randomly generated local path that will not exist until downloaded
remote_path: None
remote_source: original http/s3/gs path
Basically this signals Flyte should stay out of the way. You still get a FlyteFile object (which implements the os.PathLike interface)
[fn] downloader: noop function, even if it’s http/s3/gs
[fn] download: raises exception
path: just the given path
remote_path: None
remote_source: None
uri matches /local/path
FlyteFile object just wraps the string
[fn] downloader: noop function
[fn] download: raises exception
path: just the given path
remote_path: None
remote_source: None
Converting from a Python value (FlyteFile, str, or pathlib.Path) to a Flyte literal
Expected Python type
Type of Python value
FlyteFile
os.PathLike
str or pathlib.Path
path matches http(s)/s3/gs
Blob object is returned with uri set to the given path. No uploading happens.
path matches /local/path
Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. Blob object is returned with uri pointing to the blob store location.
No warning is logged since only a string is given (as opposed to a FlyteFile). Blob object is returned with uri set to just the given path. No uploading happens.
FlyteFile
path matches http(s)/s3/gs
Blob object is returned with uri set to the given path. Nothing is uploaded.
path matches /local/path
Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. If remote_path is given, then that is used instead of the random path. Blob object is returned with uri pointing to the blob store location.
Warning is logged since you’re passing a more complex object (a FlyteFile) and expecting a simpler interface (os.PathLike). Blob object is returned with uri set to just the given path. No uploading happens.
Since Flyte file types have a string embedded in it as part of the type, you can add a format by specifying a string after the class like so.
def t2() -> flytekit_typing.FlyteFile["csv"]: return "/tmp/local_file.csv"
- property remote_source: str#
If this is an input to a task, and the original path is an
s3
bucket, Flytekit downloads the file for the user. In case the user wants access to the original path, it will be here.
- flytekit.types.file.HDF5EncodedFile#
This can be used to denote that the returned file is of type hdf5 and can be received by other tasks that accept an hdf5 format. This is usually useful for serializing Tensorflow models
- flytekit.types.file.HTMLPage#
Can be used to receive or return an HTMLPage. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.
- flytekit.types.file.JoblibSerializedFile#
This File represents a file that was serialized using joblib.dump method can be loaded back using joblib.load.
- flytekit.types.file.JPEGImageFile#
Can be used to receive or return an JPEGImage. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.
- flytekit.types.file.PDFFile#
Can be used to receive or return an PDFFile. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.
- flytekit.types.file.PNGImageFile#
Can be used to receive or return an PNGImage. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.
- flytekit.types.file.PythonPickledFile#
This type can be used when a serialized Python pickled object is returned and shared between tasks. This only adds metadata to the file in Flyte, but does not really carry any object information.
- flytekit.types.file.PythonNotebook#
This type is used to identify a Python notebook file.
- flytekit.types.file.SVGImageFile#
Can be used to receive or return an SVGImage. The underlying type is a FlyteFile type. This is just a decoration and useful for attaching content type information with the file and automatically documenting code.