Tasks#
Tasks are the fundamental units of compute in Union. They are independently executable, strongly typed, and containerized building blocks that make up workflows. Workflows are constructed by chaining together tasks, with the output of one task feeding into the input of the next to form a directed acyclic graph.
Tasks are independently executable#
Tasks are designed to be independently executable, meaning that they can be run in isolation from other tasks. And since most tasks are just Python functions, they can be executed on your local machine, making it easy to unit test and debug tasks locally before deploying them to Union.
Because they are independently executable, tasks can also be shared and reused across multiple workflows and, as long as their logic is deterministic, their input and outputs can be cached to save compute resources and execution time.
Tasks are strongly typed#
Tasks have strongly typed inputs and outputs, which are validated at deployment time. This helps catch bugs early and ensures that the data passing through tasks and workflows is compatible with the explicitly stated types.
Under the hood, Union uses the Flyte type system and translates between the Flyte types and the SDK language types, in this case Python. Python type annotations make sure that the data passing through tasks and workflows is compatible with the explicitly stated types defined through a function signature. The Flyte type system is also used for caching, data lineage tracking, and automatic serialization and deserialization of data as it’s passed from one task to another.
Tasks are containerized#
While (most) tasks are locally executable, when a task is deployed to Union as part of the registration process it is containerized and run in its own independent Kubernetes pod. This allows tasks to have their own independent set of software dependencies and hardware requirements. For example, a task that requires a GPU can be deployed to Union with a GPU-enabled container image, while a task that requires a specific version of a software library can be deployed with that version of the library installed.
Tasks are named, versioned, and immutable#
The fully qualified name of a task is a combination of its project, domain, and name. To update a task, you change it and re-register it under the same fully qualified name. This creates a new version of the task while the old version remains available. At the version level task are, therefore, immutable. This immutability is important for ensuring that workflows are reproducible and that the data lineage is accurate.
Tasks are (usually) deterministic and cacheable#
When deciding if a unit of execution is suitable to be encapsulated as a task, consider the following questions:
Is there a well-defined graceful/successful exit criteria for the task?
A task is expected to exit after completion of input processing.
Is it deterministic and repeatable?
Under certain circumstances, a task might be cached or rerun with the same inputs. It is expected to produce the same output every time. You should, for example, avoid using random number generators with the current clock as seed.
Is it a pure function? That is, does it have side effects that are unknown to the system?
It is recommended to avoid side-effects in tasks.
When side-effects are unavoidable, ensure that the operations are idempotent.
For details on task caching, see Caching.