Search
K

Release Notes

Release notes and upgrade installations for Union Operator

v1.1.0 (September 2023)

Highlights

  • 📊 UI enhancements for clearer state communication and simplified navigation.
  • 🔐 Task resource validation to quickly address resource allocation issues.
  • 💰 Billing dashboard empowers you to better track and manage your usage, allowing for more strategic decision-making around your budget and resources.
  • 📈 Increased data retention for task-level monitoring extends the window of data visibility for pod-level monitoring data to 7 days after the execution was submitted.

UI enhancements

A number of enhancements were added to improve navigation and interactions within the user console.
  • The left sidebar is now the central navigation for projects, executions, workflows, tasks, launch plans, usage, and user management.
  • A new “breadcrumb” navigation bar communicates state more clearly and enables quick traversal across projects, domains, and the workflow DAG.
  • All tooltips are now clickable and copyable, making it easier to copy identifiers (like execution IDs) and other information from the UI for use within the CLI or other modalities.
Union Cloud web console prior to v1.1.0
Union Cloud web console as of v1.1.0

Task resource validation

Static and dynamic workflow tasks can include annotations that request computational resources (CPU, GPU, memory, storage). Prior to this release, if the task request exceeded available resources, the status would remain as queued indefinitely, with limited visibility in the UI. With task resource validation, these annotations are checked at submission time for static tasks and at runtime for dynamic tasks, to ensure that there are nodes available that satisfy the resource requirements. If the task cannot be satisfied by any existing node, the status will immediately change from queued to failed.
Example workflow that requests 32Gi memory from the cluster
The cluster utilizes c5.4xlarge nodes that have 32Gi memory but, due to overheads, can only allocate approximately 30.8Gi - and therefore fail the task resource validation.
When the user launches the workflow, it fails and sends a 400 error message due to unallocatable node overhead.

Billing dashboard

The new billing dashboard offers customers increased transparency and insights into their usage costs. View compute hours by node type, usage-based totals for each billing cycle, and easily access historical invoices to manage your budget more effectively. With centralized billing information, customers can streamline administrative tasks and allocate resources across projects more effectively.

Increased data retention for task-level monitoring

Previously, pod-level monitoring data was only available for 72 hours after an execution was submitted. This window of data visibility has been extended to 7 days after an execution is submitted to assist users in diagnosing performance issues and overall aid in troubleshooting more effectively.

Support for compute plugins

Union Cloud now supports plugins for Spark, Ray, PyTorch, Tensorflow, and MPI. For now, these plugins will be enabled through customer support requests on an on-demand basis.

Execution tags (experimental feature from Flyte 1.9)

This feature is currently being introduced in an experimental capacity. We value feedback from users to enhance and refine the feature further.
Execution tags allow users to discover their executions and other Flyte entities more easily, by creating smarter groupings.
You can create execution tags using the following commands:
pyflyte run --remote --tag "batch" example_start.py wf
pyflyte run --remote --tag "batch" --tag "inference" example_start.py wf
To retrieve the tags using FlyteRemote, follow these steps:
remote = FlyteRemote(
config=Config.auto("..."),
default_project="flytesnacks",
default_domain="development",
)
flyte_workflow = remote.fetch_execution(name="<execution-id>")
print(flyte_workflow.spec.tags)

v1.0.0 (May 2023)

Union Cloud is now in Expanded Availability! This version 1.0.0 is packed with new features, improvements, and configuration options.

Highlights

  • 📊 Task-level monitoring to view resource consumption of running tasks in real-time.
  • 🔐 Role-based access control supporting fine-grained (project-domain level) permissions.
  • 🚀 Flyte 1.6 with on-the-fly image building, UI-based runtime metrics, and tons more.
  • 🗺 European regions deployable (eu-central and eu-west). Reach out to us to add a region.
  • 📸 In-app feedback capture to quickly create support tickets and provide feedback.
For any task execution on Union Cloud, it is now possible to visualize the resource limits, allocation, and actual usage in real-time (and historically, subject to a retention window). The resources currently tracked are:
  • Memory Quota (shown below)
  • CPU Cores Quota
  • GPU Memory Utilization
  • GPU Utilization
Task-level monitoring is designed to provide a feedback loop between Flyte resource requests and what actually happens in the cluster, enabling the following use cases:
  • Identification of cost-savings and performance improvement opportunities, such as idle GPU cores or low memory utilization.
  • Headroom estimation for repeated jobs where the input data is monotonically increasing.
  • Qualitative analysis of tasks that failed with an out-of-memory error.
In addition to shipping with the standard RBAC Roles (Viewer, Contributor, and Admin), Union Cloud now offers fine-grained flexibility to grant each user specific permissions within the scope of specific projects and/or domains (development, staging, production). This enables use cases like the following:
  • Limit machine applications (for example, CI bots) to only have permission to register workflows, tasks, and launch plans without being able to trigger executions.
  • Restrict some human users to only have permission to create executions without being able to register workflows, tasks, and launch plans.

Flyte 1.6

The latest version of Flyte is packed with new functionality:
  • 🔎 Runtime Metrics: the timeline view now contains more granular intra-task state transition details, allowing inference about potential issues such as scheduling contention or prolonged image pull times.
  • ⬆️ ImageSpec: Users can now define and build container images for Flyte tasks and workflows by specifying the necessary components directly inline (without the need for a Dockerfile).
  • Prettified CLI: The CLI now leverages rich-click for improved and more visually appealing output.
  • 📊 Flyte Decks Execution Insights: Flyte Decks now generates a comprehensive timeline graph that showcases the duration of different components involved in task execution.
  • 🔁 Lazy loading of Flytekit Dependencies: Flytekit now handles dependency loading more efficiently and dynamically, reducing memory usage and enhancing overall performance.
  • 🔥 PyTorch Elastic Training (torchrun): Flyte now offers seamless support for distributed training using PyTorch elastic (torchrun), allowing you to efficiently harness distributed resources and tackle complex machine learning tasks.
See the Flyte 1.6 blog post for a full breakdown.

European regions

We've launched the availability of two new regions in Europe, bringing our list of supported regions to:
  • us-east
  • us-west
  • eu-west
  • eu-central

In-app feedback capture

Union Cloud can now collect your bug reports and valuable feedback! To file a ticket, click the Feedback button at the bottom-right of the UI and follow the on-screen instructions. The user feedback is reviewed by the team on a daily basis and taken extremely seriously!

v0.0.49 (Mar 2023)

Union Operator v0.0.49

  • Stability improvements and bugfixes
In order to upgrade to this version:
  1. 1.
    Update your values.yaml if they don't already include
    union:
    enabled: true
    enableTunnelService: true
    ...
  2. 2.
    Perform the following where you initially deployed following the EKS or GKE guide:
    helm repo update
    helm upgrade -n union -f values.yaml --create-namespace union-operator unionai/union-operator

v0.0.35 (Dec 2022)

Union Operator v0.0.35

  • Fixed sync config issue which can delay updates from union services.
In order to upgrade to this version, perform the following where you initially deployed following the EKS or GKE guide:
helm repo update
helm upgrade -n union -f values.yaml --create-namespace union-operator unionai/union-operator
The upgrade process is not expected to be disruptive and should not interrupt your running workflow executions.

v0.0.33 (Nov 2022)

Union Operator v0.0.33

  • Includes updated support for fast registration. In order to enable it, you'll need to
  1. 1.
    Update your values.yamlPreviousFAQ:
    union:
    enableTunnelService: true
    ...
  2. 2.
    Upgrade your existing installation, where you initially deployed following the EKS or GKE guide:
    helm repo update
    helm upgrade -n union -f values.yaml --create-namespace union-operator unionai/union-operator
The upgrade process is not expected to be disruptive and should not interrupt your running workflow executions.
Last modified 12d ago