An execution represents a single run of all or part of a workflow (including sub-workflows or individual tasks).
An execution usually represents the run of an entire workflow. But, because workflows are composed of tasks (and sometimes sub-workflows) and Flyte caches the outputs of those independently of the workflows in which they participate, it sometimes makes sense to execute a task or sub-workflow independently.
The execution view provides a lot of detailed information about the execution:
In the top bar:
Below the top bar, details on the execution (domain, cluster, time, etc.) are displayed:
Below that, three tabs provide access to the Nodes, Graph, and Timeline views:
The default tab within the execution view is the Nodes tab. It shows a list of the nodes that make up this execution. A node in Flyte is either a task or a (sub-)workflow:
Selecting an item in the list opens the right panel showing more details of that specific node:
Within the right panel, we can see
- Node ID: n0
- Task name: workflows.diffuse.start_process
- Success status: SUCCEEDED
- Caching status: Caching was disabled for this execution
- Type: Python Task
- Rerun button
Below that, you have the tabs Executions, Inputs, Outputs, and Task.
This tab gives you details on the execution of this particular node.
You can access logs by clicking the text under Logs. In AWS-based systems this will say CloudWatch Logs. In GCP-based system this will say StackDriver Logs.
This tab displays the input to this node.:
This tab displays the output of this node:
If this node is Task (as opposed to a sub-workflow) this tab displays the Task definition:
The Graph tab displays a visual representation of the execution as a directed acyclic graph:
The Timeline tab displays a visualization showing the timing of each task in the execution:
A workflow or task execution can be in one of five states:
- QUEUED: The cluster has picked up the task or workflow but it is awaiting execution.
- RUNNING: The task or workflow is being executed.
- FAILED: The task or workflow has stopped due to an error.
- SUCCEEDED: The task or workflow has successfully completed.
- UNKNOWN: The task or workflow has not yet been picked up by the cluster. This may be due to the system hitting the 10,000 concurrent workflow executions limit (see Workflow execution limits), in which case the UNKNOWN state should not be long-lasting, or it may be due to some other anomaly. In general, a long-lasting UNKNOWN state that is not obviously the result of throttling should be regarded as an error state and should be investigated.
By default, Union Cloud has a throttling mechanism that limits the number of concurrent workflow executions per data plane cluster (equivalently, per organization) to 10,000. This limit can be adjusted on a per-customer basis. To change your limit, contact the Union Cloud team.
Executions beyond the limit will be executed as soon as resources become available. While waiting, the workflow execution will be reported as in the UNKNOWN state.
The limit prevents the situation where a large number of workflow executions (perhaps launched through an automated process, for example) overwhelms the data plane cluster, in effect performing a self-caused denial of service attack.