Data is a key driver for your business. Data volume can increase significantly over time, and it often requires concurrent consumption of large compute resources. Data integration workloads can become increasingly concurrent as more and more applications demand access to data at the same time. In AWS, hundreds of thousands of customers use AWS Glue, a serverless data integration service, for integrating data across multiple data sources at scale. AWS Glue jobs can be triggered asynchronously via a schedule or event, or started synchronously, on-demand.
Your AWS account has quotas, also referred to as limits, which are the maximum number of service resources for your AWS account. AWS Glue quotas helps guarantee the availability of AWS Glue resources and prevents accidental over provisioning of resources. However, with large or spiky workloads, it can be challenging to manage job run concurrency or Data Processing Units (DPU) to stay under the service quotas.
Traditionally, when you hit the quota of concurrent Glue job runs, your jobs fail immediately.
Today, we are pleased to announce the general availability of AWS Glue job queuing. Job queuing increases scalability and improves the customer experience of managing AWS Glue jobs. With this new capability, you no longer need to manage concurrency of your AWS Glue job runs and attempt retries just to avoid job failures due to high concurrency. You can simply start your jobs, and when the job runs are in Waiting state, the AWS Glue job queuing feature staggers jobs automatically whenever possible. This increases your job success rates and the experience for large concurrency workloads.
This post demonstrates how job queuing helps you scale your Glue workloads and how job queuing works.
The following are common data integration use cases where many concurrent job runs are needed:
AWS Glue has the following service quotas per Region and account related to concurrent usage:
You can also configure maximum concurrency for individual jobs.
In the aforementioned typical use cases, when you run a job through the StartJobRun
API or AWS Glue console, you may hit the upper limit defined at any of the discussed places. If this happens, your job fails immediately due to errors like ConcurrentRunsExceededException
returned by the AWS Glue API endpoint.
Job queuing helps those typical use cases without forcing you to manage concurrency between all your job runs. You no longer need to make manual retries when you get ConcurrentRunsExceededException
. Job queuing enqueues job runs when you hit the limit and automatically reattempts job runs when resources free up. It simplifies your daily operation and reduces latency for the retries. It also allows you to scale more with AWS Glue jobs.
In the next section, we describe how job queuing is configured.
To enable job queuing on the AWS Glue Studio console, complete the following steps:
In the next section, we describe how job queuing works.
In the current job run lifecycle, the job-level and account-level limits are checked when a job starts, and the job moves to a Failed state when these limits are reached. With job queuing, your job run state goes into a Waiting state to be reattempted instead of Failed. The Waiting state means that job run is queued for retry after the limits have been exceeded or resources were not unavailable. Job queueing is another retry mechanism in addition to the customer-specified max retry.
AWS Glue job queuing will improve the success rates of job runs and reduce failures due to limits, but it doesn’t guarantee job run success. Limits and resources could still be unavailable by the time the reattempt run starts.
The following screenshot shows that two job runs are in the Waiting state:
The following limits are covered by job queuing:
The retry mechanism is configured to retry for a maximum of 15 minutes or 10 attempts, whichever comes first.
Here’s the state transition diagram for job runs when job queuing is enabled.
Keep in mind the following considerations:
In this post, we described how the new job queuing capability helps you scale your AWS Glue job workload. You can start leveraging job queuing for your new jobs or existing jobs today. We are looking forward to hearing your feedback.
Noritaka Sekiyama is a Principal Big Data Architect on the AWS Glue team. He works based in Tokyo, Japan. He is responsible for building software artifacts to help customers. In his spare time, he enjoys cycling with his road bike.
 Gyan Radhakrishnan is a Software Development Engineer on the AWS Glue team. He is working on designing and building end-to-end solutions for data intensive applications.
Simon Kern is a Software Development Engineer on the AWS Glue team. He is enthusiastic about serverless technologies, data engineering and building great services.
Dana Adylova is a Software Development Engineer on the AWS Glue team. She is working on building software for supporting data intensive applications. In her spare time, she enjoys knitting and reading sci-fi.
Matt Su is a Senior Product Manager on the AWS Glue team. He enjoys helping customers uncover insights and make better decisions using their data with AWS Analytic services. In his spare time, he enjoys skiing and gardening.
Patriots "Belichick has treated most of the NFL like [expletive] for the last 20 ye
December 26th, 2024 – 7:00 PM Georgia Department of Labor – The Georgia Department of Labor announced Thursday that Rome recorded an unem
LOUDON COUNTY, Tenn. (WVLT) - Right now, 250 people were told that come February, they will not have a job.“These are significant emotional events and we donâ
Searching for a job that doesn't ask for a college degree?A new report from AI interview firm Final Round AI, based on Glassdoor data, ranked the top 10