FlyteDirectory#

class flytekit.types.directory.FlyteDirectory(path: 'typing.Union[str, os.PathLike]', downloader: 'typing.Optional[typing.Callable]' = None, remote_directory: 'typing.Optional[typing.Union[os.PathLike, str, typing.Literal[False]]]' = None)#
Parameters:
  • path (str | PathLike)

  • downloader (Optional[Callable])

  • remote_directory (Optional[Union[os.PathLike, str, Literal[False]]])

crawl(maxdepth=None, topdown=True, **kwargs)#

Crawl returns a generator of all files prefixed by any sub-folders under the given “FlyteDirectory”. if details=True is passed, then it will return a dictionary as specified by fsspec.

Example:

>>> list(fd.crawl())
[("/base", "file1"), ("/base", "dir1/file1"), ("/base", "dir2/file1"), ("/base", "dir1/dir/file1")]
>>> list(x.crawl(detail=True))
[('/tmp/test', {'my-dir/ab.py': {'name': '/tmp/test/my-dir/ab.py', 'size': 0, 'type': 'file',
 'created': 1677720780.2318847, 'islink': False, 'mode': 33188, 'uid': 501, 'gid': 0,
  'mtime': 1677720780.2317934, 'ino': 1694329, 'nlink': 1}})]
Parameters:
  • maxdepth (int | None)

  • topdown (bool)

Return type:

Generator[Tuple[str | PathLike[Any], Dict[Any, Any]], None, None]

classmethod from_source(source)#

Create a new FlyteDirectory object with the remote source set to the input

Parameters:

source (str | PathLike)

Return type:

FlyteDirectory

classmethod listdir(directory)#

This function will list all files and folders in the given directory, but without downloading the contents. In addition, it will return a list of FlyteFile and FlyteDirectory objects that have ability to lazily download the contents of the file/folder. For example:

entity = FlyteDirectory.listdir(directory)
for e in entity:
    print("s3 object:", e.remote_source)
    # s3 object: s3://test-flytedir/file1.txt
    # s3 object: s3://test-flytedir/file2.txt
    # s3 object: s3://test-flytedir/sub_dir

open(entity[0], "r")  # This will download the file to the local disk.
open(entity[0], "r")  # flytekit will read data from the local disk if you open it again.
Parameters:

directory (FlyteDirectory)

Return type:

List[FlyteDirectory | FlyteFile]

new_dir(name=None)#

This will create a new folder under the current folder. If given a name, it will use the name given, otherwise it’ll pick a random string. Collisions are not checked.

Parameters:

name (str | None)

Return type:

FlyteDirectory

new_file(name=None)#

This will create a new file under the current folder. If given a name, it will use the name given, otherwise it’ll pick a random string. Collisions are not checked.

Parameters:

name (str | None)

Return type:

FlyteFile

classmethod new_remote(stem=None, alt=None)#

Create a new FlyteDirectory object using the currently configured default remote in the context (i.e. the raw_output_prefix configured in the current FileAccessProvider object in the context). This is used if you explicitly have a folder somewhere that you want to create files under. If you want to write a whole folder, you can let your task return a FlyteDirectory object, and let flytekit handle the uploading.

Parameters:
  • stem (str | None) – A stem to append to the path as the final prefix “directory”.

  • alt (str | None) – An alternate first member of the prefix to use instead of the default.

Return FlyteDirectory:

A new FlyteDirectory object that points to a remote location.

Return type:

FlyteDirectory

path: str | PathLike = None#

Warning

This class should not be used on very large datasets, as merely listing the dataset will cause the entire dataset to be downloaded. Listing on S3 and other backend object stores is not consistent and we should not need data to be downloaded to list.

Please first read through the comments on the flytekit.types.file.FlyteFile class as the implementation here is similar.

One thing to note is that the os.PathLike type that comes with Python was used as a stand-in for FlyteFile. That is, if a task’s output signature is an os.PathLike, Flyte takes that to mean FlyteFile. There is no easy way to distinguish an os.PathLike where the user means a File and where the user means a Directory. As such, if you want to use a directory, you must declare all types as FlyteDirectory. You’ll still be able to return a string literal though instead of a full-fledged FlyteDirectory object assuming the str is a directory.

Converting from a Flyte literal value to a Python instance of FlyteDirectory

Type of Flyte IDL Literal

FlyteDirectory

Multipart Blob

uri matches http(s)/s3/gs

FlyteDirectory object stores the original string path, but points to a local file instead.

  • [fn] downloader: function that writes to path when open’ed.

  • [fn] download: will trigger download

  • path: randomly generated local path that will not exist until downloaded

  • remote_path: None

  • remote_source: original http/s3/gs path

uri matches /local/path

FlyteDirectory object just wraps the string

  • [fn] downloader: noop function

  • [fn] download: raises exception

  • path: just the given path

  • remote_path: None

  • remote_source: None


Converting from a Python value (FlyteDirectory, str, or pathlib.Path) to a Flyte literal

Type of Python value

FlyteDirectory

str or pathlib.Path or FlyteDirectory

path matches http(s)/s3/gs

Blob object is returned with uri set to the given path. Nothing is uploaded.

path matches /local/path

Contents of file are uploaded to the Flyte blob store (S3, GCS, etc.), in a bucket determined by the raw_output_data_prefix setting. If remote_path is given, then that is used instead of the random path. Blob object is returned with uri pointing to the blob store location.

As inputs

def t1(in1: FlyteDirectory):
    ...

def t1(in1: FlyteDirectory["svg"]):
    ...

As outputs:

The contents of this local directory will be uploaded to the Flyte store.

return FlyteDirectory("/path/to/dir/")

return FlyteDirectory["svg"]("/path/to/dir/", remote_path="s3://special/output/location")

Similar to the FlyteFile example, if you give an already remote location, it will not be copied to Flyte’s durable store, the uri will just be stored as is.

return FlyteDirectory("s3://some/other/folder")

Note if you write a path starting with http/s, if anything ever tries to read it (i.e. use the literal as an input, it’ll fail because the http proxy doesn’t know how to download whole directories.

The format [] bit is still there because in Flyte, directories are stored as Blob Types also, just like files, and the Blob type has the format field. The difference in the type field is represented in the dimensionality field in the BlobType.

property remote_source: str#

If this is an input to a task, and the original path is s3://something, flytekit will download the directory for the user. In case the user wants access to the original path, it will be here.

flytekit.types.directory.TensorboardLogs#

This type can be used to denote that the output is a folder that contains logs that can be loaded in TensorBoard. This is usually the SummaryWriter output in PyTorch or Keras callbacks which record the history readable by TensorBoard.

flytekit.types.directory.TFRecordsDirectory#

This type can be used to denote that the output is a folder that contains tensorflow record files. This is usually the TFRecordWriter output in Tensorflow which writes serialised tf.train.Example message (or protobuf) to tfrecord files