Accessing AWS S3 buckets#

Here we will take a look at how to access data on AWS S3 Buckets from Union. As a prerequisite, we assume that our AWS S3 bucket is accessible with API keys: AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY.

Creating secrets on Union#

First, we create secrets on Union by running the following command:

union create secret AWS_ACCESS_KEY_ID

This will open a prompt where we paste in our AWS credentials:

Enter secret value: 🗝️

Repeat this process for all other AWS credentials, such as AWS_SECRET_ACCESS_KEY.

Using secrets in a task#

Next, we can use the secrets directly in a task! With AWS CLI, we create a small text file and move it to a AWS bucket

aws s3 mb s3://test_bucket
echo "Hello Union" > my_file.txt
aws s3 cp my_file.txt s3://test_bucket/my_file.txt

Next, we give a task access to our AWS secrets by supplying them through secret_requests. For this guide, save the following snippet as aws-s3-access.py and run:

from flytekit import task, current_context, Secret, workflow

@task(
    secret_requests=[
        Secret(key="AWS_ACCESS_KEY_ID"),
        Secret(key="AWS_SECRET_ACCESS_KEY"),
    ],
)
def read_s3_data() -> str:
    import s3fs
    secrets = current_context().secrets

    s3 = s3fs.S3FileSystem(
        secret=secrets.get(key="AWS_SECRET_ACCESS_KEY"),
        key=secrets.get(key="AWS_ACCESS_KEY_ID"),
    )

    with s3.open("test_bucket/my_file.txt") as f:
        content = f.read().decode("utf-8")
    return content

@workflow
def main():
    read_s3_data()

Within the task, the secrets are available through current_context().secrets and passed to s3fs. Running the following command to execute the workflow:

union run --remote aws-s3-access.py main

Conclusion#

You can easily access your AWS S3 buckets by running union create secret and configuring your tasks to access the secrets!