Introducing AWS Glue Flex jobs: Price financial savings on ETL workloads


AWS Glue is a serverless knowledge integration service that makes it easy to find, put together, and mix knowledge for analytics, machine studying (ML), and utility improvement. You should use AWS Glue to create, run, and monitor knowledge integration and ETL (extract, rework, and cargo) pipelines and catalog your property throughout a number of knowledge shops. Sometimes, these knowledge integration jobs can have various levels of precedence and time sensitivity. For instance, non-urgent workloads corresponding to pre-production, testing, and one-time knowledge masses typically don’t require quick job startup occasions or constant runtimes through devoted assets.

As we speak, we’re happy to announce the overall availability of a brand new AWS Glue job run class referred to as Flex. Flex lets you optimize your prices in your non-urgent or non-time delicate knowledge integration workloads corresponding to pre-production jobs, testing, and one-time knowledge masses. With Flex, AWS Glue jobs run on spare compute capability as a substitute of devoted {hardware}. The beginning and runtimes of jobs utilizing Flex can fluctuate as a result of spare compute assets aren’t available and might be reclaimed throughout the run of a job

Whatever the run choice used, AWS Glue jobs have the identical capabilities, together with entry to customized connectors, visible authoring interface, job scheduling, and Glue Auto Scaling. With the Flex execution choice, clients can optimize the prices of their knowledge integration workloads by configuring the execution choice based mostly on the workloads’ necessities, utilizing normal execution choice for time-sensitive workloads, and Flex for non-urgent workloads. The Flex execution class is on the market for AWS Glue 3.0 Spark jobs.

The Flex execution class is on the market for AWS Glue 3.0 Spark jobs.

On this submit, we offer extra particulars about AWS Glue Flex jobs and easy methods to allow Flex capability.

How do you employ Versatile capability?

The AWS Glue jobs API now helps an extra parameter referred to as execution-class, which helps you to select STANDARD or FLEX when working the job. To make use of Flex, you merely set the parameter to FLEX.

To allow Flex through the AWS Glue Studio console, full the next steps:

  1. On the AWS Glue Studio console, whereas authoring a job, navigate to the Job particulars tab
  2. Choose Flex Execution.
  3. Set an acceptable worth for the Job Timeout parameter (defaults to 120 minutes for Flex jobs).
  4. Save the job.
  5. After finalizing all different particulars, select Run to run the job with Flex capability.

On the Runs tab, you must be capable to see FLEX listed beneath Execution class.

You may as well allow Flex through the AWS Command Line Interface (AWS CLI).

You may set the --execution-class setting within the start-job-run API, which helps you to run a specific AWS Glue job’s run with Flex capability:

aws glue start-job-run --job-name my-job 
    --execution-class FLEX 
    --timeout 300 

You may as well set the --execution-class throughout the create-job API. This units the default run class of all of the runs of this job to FLEX:

aws glue create-job 
    --name flexCLI 
    --role AWSGlueServiceRoleDefault 
    --command "Title=glueetl,ScriptLocation=s3://mybucket/myfolder/" 
    --region us-east-2 
    --execution-class FLEX 
    --worker-type G.1X 
    --number-of-workers 10 
    --glue-version 3.0

The next are extra particulars in regards to the related parameters:

  • –execution-class – The enum string that specifies if a job must be run as FLEX or STANDARD capability. The default is STANDARD.
  • –timeout – Specifies the time (in minutes) the job will run earlier than it’s moved right into a TIMEOUT state.

When do you have to use Versatile capability?

The Flex execution class is good for lowering the prices of time-insensitive workloads. For instance:

  • Nightly ETL jobs, or jobs that run over weekends for processing workloads
  • One-time bulk knowledge ingestion jobs
  • Jobs working in take a look at environments or pre-production workloads
  • Time-insensitive workloads the place it’s acceptable to have variable begin and finish occasions

Compared, the usual execution class is good for time-sensitive workloads that require quick job startup and devoted assets. As well as, jobs which have downstream dependencies are higher served by the usual execution class.

What’s the typical life-cycle of a Versatile capability Job?

When a start-job-run API name is issued, with the execution-class set to FLEX, AWS Glue will start to request compute assets. If no assets can be found instantly upon issuing the API name, the job will transfer right into a WAITING state. No billing happens at this level.

As quickly because the job is ready to purchase compute assets, the job strikes to a RUNNING state. At this level, even when all of the computes requested aren’t obtainable, the job begins working on no matter {hardware} is current. As extra Flex capability turns into obtainable, AWS Glue provides it to the job, as much as a most worth specified by Variety of staff.

At this level, billing begins. You’re charged just for the compute assets which might be working at any given time, and just for the length that they ran for.

Whereas the job is working, if Flex capability is reclaimed, AWS Glue continues working the job on the present compute assets whereas it tries to fulfill the shortfall by requesting extra assets. If capability is reclaimed, billing for that capability is halted as properly. Billing for brand new capability will begin when it’s provisioned once more. If the job completes efficiently, the job’s state strikes to SUCCEEDED. If the job fails attributable to varied person or system errors, the job’s state transitions to FAILED. If the job is unable to finish earlier than the time specified by the --timeout parameter, whether or not attributable to an absence of compute capability or attributable to points with the AWS Glue job script, the job goes right into a TIMEOUT state.

Versatile job runs depend on the supply of non-dedicated compute capability in AWS, which in flip will depend on a number of elements, such because the Area and Availability Zone, time of day, day of the week, and the variety of DPUs required by a job.

A parameter of explicit significance for Flex Jobs is the --timeout worth. It’s potential for Flex jobs to take longer to run than normal jobs, particularly if capability is reclaimed whereas the job is working. Because of this, deciding on the proper timeout worth that’s acceptable to your workload is vital. Select a timeout worth such that the full price of the Flex job run doesn’t exceed an ordinary job run. If the worth is about too excessive, the job can await too lengthy, making an attempt to accumulate capability that isn’t obtainable. If the worth is about too low, the job occasions out, even when capability is on the market and the job execution is continuing appropriately.

How are Flex capability jobs billed?

Flex jobs are billed per employee on the Flex DPU-hour charges. Because of this you’re billed just for the capability that really ran throughout the execution of the job, for the length that it ran.

For instance, should you ran an AWS Glue Flex job for 10 staff, and AWS Glue was solely in a position to purchase 5 staff, you’re solely billed for 5 staff, and just for the length that these staff ran. If, throughout the job run, two out of these 5 staff are reclaimed, then billing for these two staff is stopped, whereas billing for the remaining three staff continues. If provisioning for the 2 reclaimed staff is profitable throughout the job run, billing for these two will begin once more.

For extra info on Flex pricing, discuss with AWS Glue pricing.

Conclusion

This submit discusses the brand new AWS Glue Flex job execution class, which lets you optimize prices for non-time-sensitive ETL workloads and take a look at environments.

You can begin utilizing Flex capability to your current and new workloads at present. Nevertheless, notice that the Flex class shouldn’t be supported for Python Shell jobs, AWS Glue streaming jobs, or AWS Glue ML jobs.

For extra info on AWS Glue Flex jobs, discuss with their newest documentation.

Particular because of everybody who contributed to the launch: Parag Shah, Sampath Shreekantha, Yinzhi Xi and Jessica Cheng,


Concerning the authors

Aniket Jiddigoudar is a Large Knowledge Architect on the AWS Glue staff.

Vaibhav Porwal is a Senior Software program Growth Engineer on the AWS Glue staff.

Sriram Ramarathnam is a Software program Growth Supervisor on the AWS Glue staff.