|
Showcase your ability to design data models, manage data life cycles, and ensure data quality
Category Associate
Exam duration 130 minutes
Exam format 65 questions; either multiple choice or multiple response
Languages offered English, Japanese, Korean, and Simplified Chinese
AWS Certified Data Engineer - Associate validates skills and knowledge in core
data-related AWS services, ability to ingest and transform data, orchestrate
data pipelines while applying programming concepts, design data models, manage
data life cycles, and ensure data quality.
Prepare for the exam
Gain confidence by following AWS Skill Builder's 4-step exam prep plan.
Enroll in the complete plan or choose specific courses tailored to your needs,
ensuring you're ready for exam day.
Get to know the exam with exam-style questions
Follow the 4-step plan.
Review the exam guide
Take the AWS Certification Official Practice Question Set to understand
exam-style questions.
1. Take the AWS Certification Official Pretest to identify any areas where you
need to refresh your AWS knowledge and skills.
2. Refresh your AWS Knowledge and skills
3. Enroll in digital courses where you need to fill gaps in knowledge and
skills, practice with AWS Builder Labs, AWS Cloud Quest, and AWS Jam.
Review and practice for your exam
Enroll in an Exam Prep course. The Exam Prep Standard Course is available to
anyone with an AWS Skill Builder account. The Exam Prep Enhanced courses include
additional labs, exam-style questions, and flashcards.
4 . Assess your exam readiness
Take the AWS Certification Official Practice Exam.
The Amazon DEA-C01 exam, officially known as the AWS Certified Data Analytics –
Specialty exam, is designed for individuals who perform complex data analytics
roles with experience and expertise working with AWS data analytics services.
Here are the key details:
1. Exam Overview
- Certification Name: AWS Certified Data Analytics – Specialty (DAS-C01)
- Exam Code: DEA-C01 (recently renamed DAS-C01)
- Exam Duration: 180 minutes
- Number of Questions: 65 (multiple choice, multiple response)
- Languages Available: English, Japanese, Korean, and Simplified Chinese
2. Exam Format
- Question Type: Multiple-choice and multiple-response
- Passing Score: The passing score for AWS Specialty exams typically ranges from
70% to 75%, but AWS does not disclose an official passing score.
3. Exam Content Areas
The exam covers five main domains:
- Domain 1: Collection (18%)
- Domain 2: Storage and Data Management (22%)
- Domain 3: Processing (24%)
- Domain 4: Analysis and Visualization (18%)
- Domain 5: Security (18%)
4. Prerequisites
AWS recommends candidates have at least 5 years of experience in a data
analytics field and at least 2 years of hands-on experience using AWS analytics
services.
5. Key Services to Focus On
- Data Collection: Kinesis, IoT Core, Data Migration Service
- Storage: S3, DynamoDB, Redshift
- Processing: EMR, Glue, Lambda, Kinesis Data Analytics
- Analysis and Visualization: QuickSight, Athena, Redshift
- Security: IAM, Lake Formation, and data encryption on S3 and Redshift
6. Cost and Registration
- Cost: $30 (subject to change)
- Registration: Register through [AWS Training and Certification](https://aws.amazon.com/certification/)
website via Pearson VUE or PSI.
7. Exam Preparation Resources
- AWS Whitepapers and Documentation
- AWS Certified Data Analytics Study Guide
- Practice Exams and Sample Questions
Would you like more detailed resources or study materials for each domain?
Amazon-DEA-C01 Brain Dumps Exam + Online / Offline and Android Testing Engine & 4500+ other exams included
$50 - $25 (you save $25)
Buy Now
QUESTION 1
A data engineer is configuring an AWS Glue job to read data from an Amazon S3 bucket. The data
engineer has set up the necessary AWS Glue connection details and an associated IAM role.
However, when the data engineer attempts to run the AWS Glue job, the data
engineer receives an
error message that indicates that there are problems with the Amazon S3 VPC
gateway endpoint.
The data engineer must resolve the error and connect the AWS Glue job to the S3
bucket.
Which solution will meet this requirement?
A. Update the AWS Glue security group to allow inbound traffic from the Amazon
S3 VPC gateway endpoint.
B. Configure an S3 bucket policy to explicitly grant the AWS Glue job
permissions to access the S3 bucket.
C. Review the AWS Glue job code to ensure that the AWS Glue connection details
include a fully qualified domain name.
D. Verify that the VPC's route table includes inbound and outbound routes for
the Amazon S3 VPC gateway endpoint.
Answer: D
Explanation:
The error message indicates that the AWS Glue job cannot access the Amazon S3
bucket through the
VPC endpoint. This could be because the VPCs route table does not have the
necessary routes to
direct the traffic to the endpoint. To fix this, the data engineer must verify
that the route table has an
entry for the Amazon S3 service prefix (com.amazonaws.region.s3) with the target
as the VPC
endpoint ID. This will allow the AWS Glue job to use the VPC endpoint to access
the S3 bucket
without going through the internet or a NAT gateway. For more information, see
Gateway
endpoints. Reference:
Troubleshoot the AWS Glue error oeVPC S3 endpoint validation failed
Amazon VPC endpoints for Amazon S3
[AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide]
QUESTION 2
A retail company has a customer data hub in an Amazon S3 bucket. Employees from
many countries
use the data hub to support company-wide analytics. A governance team must
ensure that the
company's data analysts can access data only for customers who are within the
same country as the analysts.
Which solution will meet these requirements with the LEAST operational effort?
A. Create a separate table for each country's customer data. Provide access to
each analyst based on
the country that the analyst serves.
B. Register the S3 bucket as a data lake location in AWS Lake Formation. Use the
Lake Formation rowlevel
security features to enforce the company's access policies.
C. Move the data to AWS Regions that are close to the countries where the
customers are. Provide
access to each analyst based on the country that the analyst serves.
D. Load the data into Amazon Redshift. Create a view for each country. Create
separate 1AM roles for
each country to provide access to data from each country. Assign the appropriate
roles to the analysts.
Answer: B
Explanation:
AWS Lake Formation is a service that allows you to easily set up, secure, and
manage data lakes. One
of the features of Lake Formation is row-level security, which enables you to
control access to specific
rows or columns of data based on the identity or role of the user. This feature
is useful for scenarios
where you need to restrict access to sensitive or regulated data, such as
customer data from different
countries. By registering the S3 bucket as a data lake location in Lake
Formation, you can use the Lake
Formation console or APIs to define and apply row-level security policies to the
data in the bucket.
You can also use Lake Formation blueprints to automate the ingestion and
transformation of data
from various sources into the data lake. This solution requires the least
operational effort compared
to the other options, as it does not involve creating or moving data, or
managing multiple tables,
views, or roles. Reference:
AWS Lake Formation
Row-Level Security
AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide, Chapter 4:
Data Lakes and
Data Warehouses, Section 4.2: AWS Lake Formation
QUESTION 3
A media company wants to improve a system that recommends media content to
customer based on
user behavior and preferences. To improve the recommendation system, the company
needs to
incorporate insights from third-party datasets into the company's existing
analytics platform.
The company wants to minimize the effort and time required to incorporate
third-party datasets.
Which solution will meet these requirements with the LEAST operational overhead?
A. Use API calls to access and integrate third-party datasets from AWS Data
Exchange.
B. Use API calls to access and integrate third-party datasets from AWS
C. Use Amazon Kinesis Data Streams to access and integrate third-party datasets
from AWS CodeCommit repositories.
D. Use Amazon Kinesis Data Streams to access and integrate third-party datasets
from Amazon Elastic Container Registry (Amazon ECR).
Answer: A
Explanation:
AWS Data Exchange is a service that makes it easy to find, subscribe to, and use
third-party data in
the cloud. It provides a secure and reliable way to access and integrate data
from various sources,
such as data providers, public datasets, or AWS services. Using AWS Data
Exchange, you can browse
and subscribe to data products that suit your needs, and then use API calls or
the AWS Management
Console to export the data to Amazon S3, where you can use it with your existing
analytics platform.
This solution minimizes the effort and time required to incorporate third-party
datasets, as you do
not need to set up and manage data pipelines, storage, or access controls. You
also benefit from the
data quality and freshness provided by the data providers, who can update their
data products as
frequently as needed12.
The other options are not optimal for the following reasons:
B . Use API calls to access and integrate third-party datasets from AWS. This
option is vague and does
not specify which AWS service or feature is used to access and integrate
third-party datasets. AWS
offers a variety of services and features that can help with data ingestion,
processing, and analysis,
but not all of them are suitable for the given scenario. For example, AWS Glue
is a serverless data
integration service that can help you discover, prepare, and combine data from
various sources, but
it requires you to create and run data extraction, transformation, and loading (ETL)
jobs, which can
add operational overhead3.
C . Use Amazon Kinesis Data Streams to access and integrate third-party datasets
from AWS
CodeCommit repositories. This option is not feasible, as AWS CodeCommit is a
source control service
that hosts secure Git-based repositories, not a data source that can be accessed
by Amazon Kinesis
Data Streams. Amazon Kinesis Data Streams is a service that enables you to
capture, process, and
analyze data streams in real time, such as clickstream data, application logs,
or IoT telemetry. It does
not support accessing and integrating data from AWS CodeCommit repositories,
which are meant for
storing and managing code, not data .
D . Use Amazon Kinesis Data Streams to access and integrate third-party datasets
from Amazon
Elastic Container Registry (Amazon ECR). This option is also not feasible, as
Amazon ECR is a fully
managed container registry service that stores, manages, and deploys container
images, not a data
source that can be accessed by Amazon Kinesis Data Streams. Amazon Kinesis Data
Streams does not
support accessing and integrating data from Amazon ECR, which is meant for
storing and managing
container images, not data .
Reference:
1: AWS Data Exchange User Guide
2: AWS Data Exchange FAQs
3: AWS Glue Developer Guide
: AWS CodeCommit User Guide
: Amazon Kinesis Data Streams Developer Guide
: Amazon Elastic Container Registry User Guide
: Build a Continuous Delivery Pipeline for Your Container Images with Amazon ECR
as Source
QUESTION 4
A financial company wants to implement a data mesh. The data mesh must support
centralized data
governance, data analysis, and data access control. The company has decided to
use AWS Glue for
data catalogs and extract, transform, and load (ETL) operations.
Which combination of AWS services will implement a data mesh? (Choose two.)
A. Use Amazon Aurora for data storage. Use an Amazon Redshift provisioned
cluster for data analysis.
B. Use Amazon S3 for data storage. Use Amazon Athena for data analysis.
C. Use AWS Glue DataBrewfor centralized data governance and access control.
D. Use Amazon RDS for data storage. Use Amazon EMR for data analysis.
E. Use AWS Lake Formation for centralized data governance and access control.
Answer: B E
Explanation:
A data mesh is an architectural framework that organizes data into domains and
treats data as
products that are owned and offered for consumption by different teams1. A data
mesh requires a
centralized layer for data governance and access control, as well as a
distributed layer for data
storage and analysis. AWS Glue can provide data catalogs and ETL operations for
the data mesh, but
it cannot provide data governance and access control by itself2. Therefore, the
company needs to
use another AWS service for this purpose. AWS Lake Formation is a service that
allows you to create,
secure, and manage data lakes on AWS3. It integrates with AWS Glue and other AWS
services to
provide centralized data governance and access control for the data mesh.
Therefore, option E is correct.
For data storage and analysis, the company can choose from different AWS
services depending on
their needs and preferences. However, one of the benefits of a data mesh is that
it enables data to be
stored and processed in a decoupled and scalable way1. Therefore, using
serverless or managed
services that can handle large volumes and varieties of data is preferable.
Amazon S3 is a highly
scalable, durable, and secure object storage service that can store any type of
data. Amazon Athena
is a serverless interactive query service that can analyze data in Amazon S3
using standard SQL.
Therefore, option B is a good choice for data storage and analysis in a data
mesh. Option A, C, and D
are not optimal because they either use relational databases that are not
suitable for storing diverse
and unstructured data, or they require more management and provisioning than
serverless services.
Reference:
1: What is a Data Mesh? - Data Mesh Architecture Explained - AWS
2: AWS Glue - Developer Guide
3: AWS Lake Formation - Features
[4]: Design a data mesh architecture using AWS Lake Formation and AWS Glue
[5]: Amazon S3 - Features
[6]: Amazon Athena - Features
QUESTION 5
A data engineer maintains custom Python scripts that perform a data
formatting process that many
AWS Lambda functions use. When the data engineer needs to modify the Python
scripts, the data
engineer must manually update all the Lambda functions.
The data engineer requires a less manual way to update the Lambda functions.
Which solution will meet this requirement?
A. Store a pointer to the custom Python scripts in the execution context object
in a shared Amazon S3 bucket.
B. Package the custom Python scripts into Lambda layers. Apply the Lambda layers
to the Lambda functions.
C. Store a pointer to the custom Python scripts in environment variables in a
shared Amazon S3 bucket.
D. Assign the same alias to each Lambda function. Call reach Lambda function by
specifying the function's alias.
Answer: B
Explanation:
Lambda layers are a way to share code and dependencies across multiple Lambda
functions. By
packaging the custom Python scripts into Lambda layers, the data engineer can
update the scripts in
one place and have them automatically applied to all the Lambda functions that
use the layer. This
reduces the manual effort and ensures consistency across the Lambda functions.
The other options
are either not feasible or not efficient. Storing a pointer to the custom Python
scripts in the execution
context object or in environment variables would require the Lambda functions to
download the
scripts from Amazon S3 every time they are invoked, which would increase latency
and cost.
Assigning the same alias to each Lambda function would not help with updating
the Python scripts,
as the alias only points to a specific version of the Lambda function code.
Reference:
AWS Lambda layers
AWS Certified Data Engineer - Associate DEA-C01 Complete Study Guide, Chapter 3:
Data Ingestion
and Transformation, Section 3.4: AWS Lambda
QUESTION 6
A company created an extract, transform, and load (ETL) data pipeline in AWS
Glue. A data engineer
must crawl a table that is in Microsoft SQL Server. The data engineer needs to
extract, transform, and
load the output of the crawl to an Amazon S3 bucket. The data engineer also must
orchestrate the data pipeline.
Which AWS service or feature will meet these requirements MOST cost-effectively?
A. AWS Step Functions
B. AWS Glue workflows
C. AWS Glue Studio
D. Amazon Managed Workflows for Apache Airflow (Amazon MWAA)
Answer: B
Explanation:
AWS Glue workflows are a cost-effective way to orchestrate complex ETL jobs that
involve multiple
crawlers, jobs, and triggers. AWS Glue workflows allow you to visually monitor
the progress and
dependencies of your ETL tasks, and automatically handle errors and retries. AWS
Glue workflows
also integrate with other AWS services, such as Amazon S3, Amazon Redshift, and
AWS Lambda,
among others, enabling you to leverage these services for your data processing
workflows. AWS Glue
workflows are serverless, meaning you only pay for the resources you use, and
you dont have to
manage any infrastructure.
AWS Step Functions, AWS Glue Studio, and Amazon MWAA are also possible options
for
orchestrating ETL pipelines, but they have some drawbacks compared to AWS Glue
workflows. AWS
Step Functions is a serverless function orchestrator that can handle different
types of data
processing, such as real-time, batch, and stream processing. However, AWS Step
Functions requires
you to write code to define your state machines, which can be complex and
error-prone. AWS Step
Functions also charges you for every state transition, which can add up quickly
for large-scale ETL pipelines.
AWS Glue Studio is a graphical interface that allows you to create and run AWS
Glue ETL jobs without
writing code. AWS Glue Studio simplifies the process of building, debugging, and
monitoring your