Why EMR?
Amazon EMR is the industry-leading cloud big data platform for data processing, interactive analysis, and machine learning (ML) using open-source frameworks such as Apache Spark, Apache Hive, and Presto. Amazon EMR pricing is simple and predictable: you pay a per-second rate for every second you use, with a one-minute minimum. A 10-node cluster running for 10 hours costs the same as a 100-node cluster running for one hour. Amazon EMR pricing depends on how you deploy your EMR applications. You can run them on EMR clusters with Amazon Elastic Cloud Compute (Amazon EC2) instances, on AWS Outposts, on Amazon Elastic Kubernetes Service (Amazon EKS), or with EMR Serverless. You can run Amazon EKS on AWS using either EC2 or AWS Fargate.
You will incur standard public IPv4 address charges for IPv4 addresses used with your Amazon EMR on EC2 clusters, Amazon EMR on EKS clusters and Amazon EMR Serverless applications. Please visit the public IPv4 address section of the VPC pricing page for more details.
AWS Pricing Calculator
Calculate your Amazon EMR and architecture cost in a single estimate.
Amazon EMR on Amazon EC2
This pricing is for Amazon EMR applications running on Amazon EMR clusters with Amazon EC2 instances.
The Amazon EMR price is added to the Amazon EC2 price (the price for the underlying servers) and Amazon Elastic Block Store (Amazon EBS) price (if attaching Amazon EBS volumes). These are also billed per-second, with a one-minute minimum. There are a variety of EC2 pricing options you can choose from, including On Demand (shown below), one-year and three-year Reserved Instances, Capacity Savings Plans, and Spot instances. Spot Instances are spare EC2 capacity available at up to a 90% discount compared to On-Demand prices. See Spot Instance price savings vs On Demand by filtering for “Instance types supported by EMR” on the Spot Instance Advisor page.
Amazon EMR on Amazon EKS
This pricing is for Amazon EMR on Amazon EKS clusters.
The Amazon EMR price is added to the Amazon EKS pricing or any other services used with EKS. You can run EKS on AWS using either EC2 or AWS Fargate. If you are using EC2 (including with EKS managed node groups), you pay for AWS resources (e.g., EC2 instances or EBS volumes) you create to run your Kubernetes worker nodes. See detailed pricing information on the EC2 pricing page. If you are using AWS Fargate, pricing is calculated based on the vCPU and memory resources used from the time you start to download your container image until the EKS pod terminates, rounded up to the nearest second. A minimum charge of one minute applies. See detailed pricing information on the AWS Fargate pricing page.
Amazon EMR pricing on Amazon EKS is calculated based on the vCPU and memory resources used from the time you start to download your EMR application image until the EKS Pod terminates, rounded up to the nearest second. Pricing is based on requested vCPU and memory resources for the Task or Pod.
Amazon EMR on AWS Outposts
Amazon EMR on AWS Outposts pricing is the same as cloud-based instances of EMR. Please refer to the AWS Outposts pricing page for details on AWS Outposts pricing.
Amazon EMR Serverless
With EMR Serverless, there are no upfront costs, and you pay for only the resources you use. You pay for the amount of vCPU, memory, and storage resources consumed by your applications.
With EMR Serverless, you create an application using an open-source framework version and then submit jobs to the application. As part of the job specification, you can provide the minimum and maximum number of concurrent workers, as well as the vCPU, memory, and storage for each worker. EMR automatically adds and removes workers based on what the job requires within your specified limits. The three dimensions of compute, memory, and storage for workers can be independently configured. You can choose from 1 vCPU, 2 vCPU, 4 vCPU, 8 vCPU, to 16 vCPU per worker, memory from 2 GB to 120 GB per worker in 1 GB to 8 GB increments. For storage options, you can choose standard storage from 20 GB to 200 GB per worker, or choose shuffle-optimized storage from 20 GB to 2 TB per worker.
You are charged for aggregate vCPU, memory, and storage resources used from the time workers are ready to run your workload until the time they stop, rounded up to the nearest second with a 1-minute minimum. If you set up your application to start workers at application startup, the requested workers will start when you start your application and end when you stop the application, or when the application remains idle.
Note: When using custom images, you are charged for aggregate vCPU, memory, and storage resources used from the time EMR Serverless starts downloading the image until the workers are stopped, rounded up to the nearest second with a 1-minute minimum.
Pricing details (compute and memory)
Pricing is based on vCPU, memory, and storage resources used by workers, aggregated across all workers.
-
Linux/x86
-
Linux/ARM
-
Linux/x86
-
-
Linux/ARM
-
Pricing details (ephemeral storage)
Standard storage: The first 20 GB of ephemeral storage is available for all workers by default, and you pay only for any additional storage configured per worker.
Shuffle Optimized Storage: You pay for the entire storage configured per worker, including the first 20 GB.
Supported worker configurations
CPU | Memory Values | Ephemeral Storage |
1 vCPU | Min. 2 GB and Max. 8 GB, in 1 GB increments | 20 GB - 200 GB |
2 vCPU | Min. 4 GB and Max. 16 GB, in 1 GB increments | 20 GB - 200 GB |
4 vCPU | Min. 8 GB and Max. 30 GB, in 1 GB increments | 20 GB - 200 GB |
8 vCPU | Min. 16 GB and Max. 60 GB, in 4 GB increments | 20 GB - 200 GB |
16 vCPU | Min. 32 GB and Max. 120 GB, in 8 GB increments | 20 GB - 200 GB |
Duration
Duration is calculated from the time a worker is ready to run your workload until the time it stops, rounded up to the nearest second with a 1-minute minimum.
Additional charges
You may incur additional charges if your applications use other AWS services. For example, if your application uses Amazon Simple Storage Service (S3) to store and process data, then you will be charged standard Amazon S3 rates. If you move data from sources such as Amazon S3, Amazon Relational Database Service (RDS), or Amazon Redshift, you are charged standard request and data transfer rates. If you use Amazon CloudWatch, you are charged standard rates for CloudWatch logs and CloudWatch events.
Amazon EMR WAL
This pricing is for Amazon EMR on EC2 clusters with Apache HBase applications using Amazon EMR WAL. Apache HBase Write Ahead Log allows recording all changes to data to file-based storage. With Amazon EMR on EC2, you can write your Apache HBase write-ahead logs to the Amazon EMR WAL, a durable managed storage layer that outlives your cluster. In the event that your cluster, or in the rare cases that the Availability Zone becomes unhealthy or unavailable, you can create a new cluster, point it to the same Amazon S3 root directory and Amazon EMR WAL workspace, and automatically recover the data in WAL within a few minutes. For more information, see Amazon EMR WAL Documentation.
You will pay for what you use for the EMR WAL. If you have an active cluster which is configured to use the WAL, you will be charged for EMR WAL storage based on usage billed as EMR-WAL-WALHours, writes as WriteRequestGiB and reads as ReadRequestGiB.
EMR-WAL-WALHours: EMR WAL will create one WAL per Apache HBase Region. After your cluster is terminated, if there is still data in EMR WAL that was not flushed to Amazon S3 - you can either recover the data by launching a recovery cluster, or choose to clean up the WAL by creating a temporary cluster and use the EMR WAL CLI to delete the EMR WAL resources. If you do not delete the EMR WAL data explicitly, EMR WAL will retain the data and charge you for any unflushed data for 30 days. You can see an example below.
ReadRequestGiB and WriteRequestGiB: These two dimensions are for the read and write requests. Apache HBase API calls to write data to your table on a cluster with EMR WAL are billed as WriteRequestGiB. EMR WAL writes will occur for all Apache HBase writes such as `Put` operations. Apache HBase API calls to read data from your EMR WAL during Apache HBase recovery operations are billed as ReadRequestGiB. Reads and Writes are charged based on item sizes and EMR bills at a minimum of 1 Byte.
Pricing Examples
Example 1: EMR on EC2
Pricing based on US-East-1 pricing.
Suppose you run an Amazon EMR application deployed on Amazon EC2, and that you use one c4.2xlarge EC2 instance as your master node and two c4.2xlarge EC2 instances as core nodes. You will be charged for both EMR and for the EC2 nodes. If you run for one month, with 100% utilization during that month, and use on-demand pricing for EC2, your charges will be:
Master node:
EMR charges = 1 instance x 0.105 USD hourly x (100 / 100 utilized/month) x 730 hours in a month = 76.65 USD (EMR master node cost)EC2 charges = 1 instance x 0.398 USD hourly x 730 hours in a month = 290.54 USD (EC2 master node cost)
Core nodes:
EMR charges = 2 instance x 0.105 USD hourly x (100 / 100 utilized/month) x 730 hours in a month = 153.30 USD (EMR core node cost)
EC2 charges = 2 instance x 0.398 USD hourly x 730 hours in a month = 581.08 USD (EC2 core node cost)
Total charges = 76.65 USD + 290.54 USD + 153.30 USD + 581.08 USD = 1101.57 USD
Example 2: EMR on EKS
Pricing based on US-East-1 pricing.
Suppose you are running an Amazon EMR-Spark application deployed on Amazon EKS. In this case, EKS gets its compute capacity using r5.2xlarge EC2 instances (8 vCPU, 64 GB RAM). Let’s assume that the EKS cluster has 100 nodes, totaling 800 vCPU, and 6400 GB of total memory. Let’s assume that that application utilizes 100 VCPUs and 300 GB of memory for 30 minutes.
Total Amazon EMR uplift charges for the job:
Total Uplift on vCPU = (100 * $0.01012 * 0.5) = (number of vCPU * per vCPU-hours rate * job runtime in hour) = $0.506
Total Uplift on memory = ( 300 * $0.00111125 *0.5) = (amount of memory used * per GB-hours rate * job runtime in hour) = $0.1667
Total EMR Uplift for the EMR job = $0.6727
Additional Costs
You pay $0.10 per hour for each Amazon EKS cluster that you create. You can use a single Amazon EKS cluster to run multiple applications by taking advantage of Kubernetes namespaces and IAM security policies. You can run EKS on AWS using either Amazon EC2 or AWS Fargate.
If you are using Amazon EC2 (including with Amazon EKS managed node groups), you pay for AWS resources (e.g. EC2 instances or Amazon EBS volumes) you create to run your Kubernetes worker nodes. You only pay for what you use, as you use it. There are no minimum fees and no upfront commitments. See detailed pricing information on the EC2 pricing page.
If you are using AWS Fargate, pricing is calculated based on the vCPU and memory resources used from the time you start to download your container image until the Amazon EKS pod terminates, rounded up to the nearest second. A minimum charge of one minute applies. See detailed pricing information on the AWS Fargate pricing page.
Example 3: EMR Serverless
Suppose you submit a Spark job to EMR Serverless. Let’s assume that the job is configured to use a minimum of 25 workers and a maximum of 75 workers, each configured with 4VCPU and 30GB of memory. Consider that no additional ephemeral storage was configured. If your job runs for 30 minutes using 25 workers (or 100 vCPU) and was automatically scaled to add 50 more workers (200 more vCPU) for 15 minutes:
Total vCPU-hours cost = (100 * $0.052624 * 0.5) + (200 * $0.052624* 0.25) = (number of vCPU * per vCPU-hours rate * job runtime in hour) = $5.2624
Total GB-hours = (750 * $0.0057785 * 0.5) + (1500 * $0.0057785 * 0.25) = (Total GB of memory configured * per GB-hours rate * job runtime in hour) = $4.333875
Total EMR Serverless Charges = $9.596275
Additional Charges: If your application uses other AWS services such as Amazon S3, you are charged standard S3 rates.
Example 4: EMR WAL
Assume you create a new Amazon EMR cluster with Apache HBase and chose to fully back up your cluster in the US East (N. Virginia) Region. Because this is for a new application, you do not know what your traffic patterns will be. For simplicity, assume that your user created 10 HBase tables including system tables, 2 HBase Regions per table, and that each time a user interacts with your application, they write 1 KiB of data.
For a period of 10 days, you receive little traffic to your application, resulting in 10,000 writes each day. However, on day 11 your application traffic spikes to 2,500,000 writes that day. You also decide to simultaneously update your custom code on your cluster and take a scheduled nightly downtime for your end users on Day 11. Let us assume this results in 1,000,000 reads from the EMR WAL for HBase recovery operations. Your application scales to deliver a seamless experience to your users. Your application then settles into a more regular traffic pattern of 50,000 writes each day through the end of the month.
The following table summarizes your total usage for the month.
Timeframe - (Day of Month) | Total Writes | Total Reads | EMR WAL Usage |
1 - 10 | 100,000 writes (10,000 writes x 10 days) | ||
11 | 2,500,000 writes | 1,000,000 reads | |
12 - 30 | 950,000 writes (50,000 writes x 19 days) | ||
Monthly Total | 3,550,000 writes | 1,000,000 reads | |
Monthly bill | $0.30 ($0.0883 per GiB of EMR WAL Write Requests x 3.55 million KiB writes / 1048576 KiB/GiB) | $0.08 ($0.0883 per GiB of EMR WAL Read Requests x 1 million KiB reads / 1048576 KiB/GiB) | $25.92 ($0.0018 per WAL per Hour of EMR WAL Usage X usage of 10 HBase Tables X 2 HBase regions per HBase Table X 1 WAL per HBase region X 30 days X 24 hours or usage of 14,400 EMR-WAL-WALHours) |
For the month, your bill will be $26.52, a total that includes $0.38 for ReadRequestGiB and WriteRequestGiB, and and $25.92 for EMR-WAL-WALHours.
Additional pricing resources
Easily calculate your monthly costs with AWS
Contact AWS specialists to get a personalized quote
Get started building with Amazon EMR in the AWS Management Console.