Dumps Moneyack Guarantee - Databricks-Certified-Professional-Data-Engineer Dumps UpTo 50% Off
Updated Jan-2023 Pass Databricks-Certified-Professional-Data-Engineer Exam - Real Practice Test Questions
NEW QUESTION 64
What is the purpose of a silver layer in Multi hop architecture?
- A. Efficient storage and querying of full and unprocessed history of data
- B. Refined views with aggregated data
- C. Replaces a traditional data lake
- D. Optimized query performance for business-critical data
- E. A schema is enforced, with data quality checks.
Answer: E
Explanation:
Explanation
The answer is, A schema is enforced, with data quality checks.
Medallion Architecture - Databricks
Silver Layer:
1.Reduces data storage complexity, latency, and redundency
2.Optimizes ETL throughput and analytic query performance
3.Preserves grain of original data (without aggregation)
4.Eliminates duplicate records
5.production schema enforced
6.Data quality checks, quarantine corrupt data
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Sorry I had to add the watermark some people in Udemy are copying my content.
NEW QUESTION 65
Data engineering team has provided 10 queries and asked Data Analyst team to build a dashboard and refresh the data every day at 8 AM, identify the best approach to set up data refresh for this dashaboard?
- A. Each query requires a separate task and setup 10 tasks under a single job to run at 8 AM to refresh the dashboard
- B. Use Incremental refresh to run at 8 AM every day.
- C. The entire dashboard with 10 queries can be refreshed at once, single schedule needs to be set up to refresh at 8 AM.
- D. Setup JOB with linear dependency to all load all 10 queries into a table so the dashboard can be refreshed at once.
- E. A dashboard can only refresh one query at a time, 10 schedules to set up the refresh.
Answer: C
Explanation:
Explanation
The answer is,
The entire dashboard with 10 queries can be refreshed at once, single schedule needs to be set up to refresh at
8 AM.
Automatically refresh a dashboard
A dashboard's owner and users with the Can Edit permission can configure a dashboard to auto-matically refresh on a schedule. To automatically refresh a dashboard:
* Click the Schedule button at the top right of the dashboard. The scheduling dialog appears.
* Graphical user interface, text, application, email, Teams Description automatically generated
* 2.In the Refresh every drop-down, select a period.
* 3.In the SQL Warehouse drop-down, optionally select a SQL warehouse to use for all the queries.
If you don't select a warehouse, the queries execute on the last used SQL ware-house.
* 4.Next to Subscribers, optionally enter a list of email addresses to notify when the dashboard is automatically updated.
* Each email address you enter must be associated with a Azure Databricks account or con-figured as an alert destination.
* 5.Click Save. The Schedule button label changes to Scheduled.
NEW QUESTION 66
In order to use Unity catalog features, which of the following steps needs to be taken on man-aged/external tables in the Databricks workspace?
- A. Enable unity catalog feature in workspace settings
- B. Upgrade to DBR version 15.0
- C. Copy data from workspace to unity catalog
- D. Migrate/upgrade objects in workspace managed/external tables/view to unity catalog
- E. Upgrade workspace to Unity catalog
Answer: D
Explanation:
Explanation
Upgrade tables and views to Unity Catalog - Azure Databricks | Microsoft Docs Managed table: Upgrade a managed to Unity Catalog External table: Upgrade an external table to Unity Catalog
NEW QUESTION 67
Kevin is the owner of both the sales table and regional_sales_vw view which uses the sales table as the underlying source for the data, and Kevin is looking to grant select privilege on the view regional_sales_vw to one of newly joined team members Steven. Which of the following is a true statement?
- A. Kevin can not grant access to Steven since he does have workspace admin privilege
- B. Kevin can grant access to the view, because he is the owner of the view and the under-lying table
- C. Kevin can not grant access to Steven since he does not have security admin privilege
- D. Kevin although is the owner but does not have ALL PRIVILEGES permission
- E. Steve will also require SELECT access on the underlying table
Answer: B
Explanation:
Explanation
The answer is, Kevin can grant access to the view, because he is the owner of the view and the un-derlying table, Ownership determines whether or not you can grant privileges on derived objects to other users, a user who creates a schema, table, view, or function becomes its owner. The owner is granted all privileges and can grant privileges to other users
NEW QUESTION 68
A new data engineer has started at a company. The data engineer has recently been added to the company's
Databricks workspace as [email protected]. The data engineer needs to be able to query the table
sales in the database retail. The new data engineer already has been granted USAGE on the database retail.
Which of the following commands can be used to grant the appropriate permissions to the new data engineer?
- A. GRANT USAGE ON TABLE [email protected] TO sales;
- B. GRANT SELECT ON TABLE [email protected] TO sales;
- C. GRANT USAGE ON TABLE sales TO [email protected];
- D. GRANT SELECT ON TABLE sales TO [email protected];
- E. GRANT CREATE ON TABLE sales TO [email protected];
Answer: D
NEW QUESTION 69
You are looking to process the data based on two variables, one to check if the department is supply chain or check if process flag is set to True
- A. if department == "supply chain" or process = TRUE:
- B. if department == "supply chain" | process == TRUE:
- C. if department == "supply chain" | if process == TRUE:
- D. if department = "supply chain" | process:
- E. if department == "supply chain" or process:
Answer: E
NEW QUESTION 70
What is the purpose of the bronze layer in a Multi-hop architecture?
- A. Used as a data source for Machine learning applications.
- B. Can be used to eliminate duplicate records
- C. Perform data quality checks, corrupt data quarantined
- D. Provides efficient storage and querying of full unprocessed history of data
- E. Contains aggregated data that is to be consumed into Silver
Answer: D
Explanation:
Explanation
The answer is Provides efficient storage and querying of full unprocessed history of data Medallion Architecture - Databricks Bronze Layer:
1.Raw copy of ingested data
2.Replaces traditional data lake
3.Provides efficient storage and querying of full, unprocessed history of data
4.No schema is applied at this layer
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Sorry I had to add the watermark some people in Udemy are copying my content.
NEW QUESTION 71
You were asked to write python code to stop all running streams, which of the following command can be used to get a list of all active streams currently running so we can stop them, fill in the blank.
1.for s in _______________:
2. s.stop()
- A. spark.streams.getActive
- B. getActiveStreams()
- C. Spark.getActiveStreams()
- D. spark.streams.active
- E. activeStreams()
Answer: D
NEW QUESTION 72
A data engineer has created a Delta table as part of a data pipeline. Downstream data analysts now need
SELECT permission on the Delta table.
Assuming the data engineer is the Delta table owner, which part of the Databricks Lakehouse Plat-form can
the data engineer use to grant the data analysts the appropriate access?
- A. Repos
- B. Databricks Filesystem
- C. Jobs
B Dashboards - D. Data Explorer
Answer: A
NEW QUESTION 73
While investigating a performance issue, you realized that you have too many small files for a given table, which command are you going to run to fix this issue
- A. VACUUM table_name
- B. MERGE table_name
- C. SHRINK table_name
- D. COMPACT table_name
- E. OPTIMIZE table_name
Answer: E
Explanation:
Explanation
The answer is OPTIMIZE table_name,
Optimize compacts small parquet files into a bigger file, by default the size of the files are determined based on the table size at the time of OPTIMIZE, the file size can also be set manually or adjusted based on the workload.
https://docs.databricks.com/delta/optimizations/file-mgmt.html
Tune file size based on Table size
To minimize the need for manual tuning, Databricks automatically tunes the file size of Delta tables based on the size of the table. Databricks will use smaller file sizes for smaller tables and larger file sizes for larger tables so that the number of files in the table does not grow too large.
Table Description automatically generated
Bottom of Form
Top of Form
NEW QUESTION 74
Your colleague was walking you through how a job was setup, but you noticed a warning message that said,
"Jobs running on all-purpose cluster are considered all purpose compute", the colleague was not sure why he was getting the warning message, how do you best explain this warning mes-sage?
- A. All-purpose clusters are less expensive than the job clusters
- B. All-purpose clusters take longer to start the cluster vs a job cluster
- C. All-purpose clusters are more expensive than the job clusters
- D. All-purpose clusters cannot be used for Job clusters, due to performance issues.
- E. All-purpose cluster provide interactive messages that can not be viewed in a job
Answer: C
Explanation:
Explanation
Warning message:
Graphical user interface, text, application, email Description automatically generated
Pricing for All-purpose clusters are more expensive than the job clusters AWS pricing(Aug 15th 2022)Graphical user interface Description automatically generated
Bottom of Form
Top of Form
NEW QUESTION 75
The data engineering team is using a SQL query to review data completeness every day to monitor the ETL job, and query output is being used in multiple dashboards which of the following ap-proaches can be used to set up a schedule and automate this process?
- A. They can schedule the query to run every day from the Jobs UI.
- B. They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL
- C. They can schedule the query to run every 12 hours from the Jobs UI.
- D. They can schedule the query to refresh every day from the SQL endpoint's page in Databricks SQL.
- E. They can schedule the query to refresh every day from the query's page in Databricks SQL
Answer: E
Explanation:
Explanation
The answer is They can schedule the query to refresh every 12 hours from the SQL endpoint's page in Databricks SQL, The query pane view in Databricks SQL workspace provides the ability to add or edit and schedule individual queries to run.
You can use scheduled query executions to keep your dashboards updated or to enable routine alerts. By default, your queries do not have a schedule.
Note
If your query is used by an alert, the alert runs on its own refresh schedule and does not use the query schedule.
To set the schedule:
* Click the query info tab.
* Graphical user interface, text, application, email Description automatically generated
* Click the link to the right of Refresh Schedule to open a picker with schedule intervals.
* Graphical user interface, application Description automatically generated
* 3.Set the schedule.
* The picker scrolls and allows you to choose:
* *An interval: 1-30 minutes, 1-12 hours, 1 or 30 days, 1 or 2 weeks
* *A time. The time selector displays in the picker only when the interval is greater than 1 day and the day selection is greater than 1 week. When you schedule a specific time, Databricks SQL takes input in your computer's timezone and converts it to UTC. If you want a query to run at a certain time in UTC, you must adjust the picker by your local offset. For example, if you want a query to execute at 00:00 UTC each day, but your current timezone is PDT (UTC-7), you should select 17:00 in the picker:
* Graphical user interface Description automatically generated
NEW QUESTION 76
What is the purpose of a gold layer in Multi-hop architecture?
- A. Data quality checks and schema enforcement
- B. Eliminate duplicate records
- C. Preserves grain of original data, without any aggregations
- D. Optimizes ETL throughput and analytic query performance
- E. Powers ML applications, reporting, dashboards and adhoc reports.
Answer: E
Explanation:
Explanation
The answer is Powers ML applications, reporting, dashboards and adhoc reports.
Review the below link for more info,
Medallion Architecture - Databricks
Gold Layer:
1.Powers Ml applications, reporting, dashboards, ad hoc analytics
2.Refined views of data, typically with aggregations
3.Reduces strain on production systems
4.Optimizes query performance for business-critical data
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Sorry I had to add the watermark some people in Udemy are copying my content.
NEW QUESTION 77
Below sample input data contains two columns, one cartId also known as session id, and the second column is called items, every time a customer makes a change to the cart this is stored as an array in the table, the Marketing team asked you to create a unique list of item's that were ever added to the cart by each customer, fill in blanks by choosing the appropriate array function so the query produces below expected result as shown below.
Schema: cartId INT, items Array<INT>
Sample Data
1.SELECT cartId, ___ (___(items)) as items
2.FROM carts GROUP BY cartId
Expected result:
cartId items
1 [1,100,200,300,250]
- A. ARRAY_DISTINCT, ARRAY_UNION
- B. FLATTEN, COLLECT_UNION
- C. ARRAY_UNION, FLATTEN
- D. ARRAY_UNION, ARRAY_DISTINT
- E. ARRAY_UNION, COLLECT_SET
Answer: E
Explanation:
Explanation
COLLECT SET is a kind of aggregate function that combines a column value from all rows into a unique list ARRAY_UNION combines and removes any duplicates, Graphical user interface, application Description automatically generated with medium confidence
NEW QUESTION 78
The current ELT pipeline is receiving data from the operations team once a day so you had setup an AUTO LOADER process to run once a day using trigger (Once = True) and scheduled a job to run once a day, operations team recently rolled out a new feature that allows them to send data every 1 min, what changes do you need to make to AUTO LOADER to process the data every 1 min.
- A. Setup a job cluster run the notebook once a minute
- B. Change AUTO LOADER trigger to .trigger(ProcessingTime = "1 minute")
- C. Change AUTO LOADER trigger to ("1 minute")
- D. Convert AUTO LOADER to structured streaming
- E. Enable stream processing
Answer: B
NEW QUESTION 79
You are designing an analytical to store structured data from your e-commerce platform and un-structured data from website traffic and app store, how would you approach where you store this data?
- A. Use traditional data warehouse for structured data and use data lakehouse for un-structured data.
- B. Traditional data warehouses are good for storing structured data and enforcing schema
- C. Data lakehouse can only store unstructured data but cannot enforce a schema
- D. Data lakehouse can store structured and unstructured data and can enforce schema
Answer: D
Explanation:
Explanation
The answer is, Data lakehouse can store structured and unstructured data and can enforce schema What Is a Lakehouse? - The Databricks Blog Graphical user interface, text, application Description automatically generated
NEW QUESTION 80
Which of the following SQL statements can be used to update a transactions table, to set a flag on the table from Y to N
- A. REPLACE transactions SET active_flag = 'N' WHERE active_flag = 'Y'
- B. UPDATE transactions SET active_flag = 'N' WHERE active_flag = 'Y'
- C. MODIFY transactions SET active_flag = 'N' WHERE active_flag = 'Y'
- D. MERGE transactions SET active_flag = 'N' WHERE active_flag = 'Y'
Answer: A
Explanation:
Explanation
The answer is
UPDATE transactions SET active_flag = 'N' WHERE active_flag = 'Y'
Delta Lake supports UPDATE statements on the delta table, all of the changes as part of the update are ACID compliant.
NEW QUESTION 81
What is the purpose of the silver layer in a Multi hop architecture?
- A. Eliminates duplicate data, quarantines bad data
- B. Refined views with aggregated data
- C. Replaces a traditional data lake
- D. Efficient storage and querying of full, unprocessed history of data
- E. Optimized query performance for business-critical data
Answer: A
Explanation:
Explanation
Medallion Architecture - Databricks
Silver Layer:
1. Reduces data storage complexity, latency, and redundency
2. Optimizes ETL throughput and analytic query performance
3. Preserves grain of original data (without aggregation)
4. Eliminates duplicate records
5. production schema enforced
6. Data quality checks, quarantine corrupt data
Exam focus: Please review the below image and understand the role of each layer(bronze, silver, gold) in medallion architecture, you will see varying questions targeting each layer and its purpose.
Sorry I had to add the watermark some people in Udemy are copying my content.
A diagram of a house Description automatically generated with low confidence
NEW QUESTION 82
......
Download Free Databricks Databricks-Certified-Professional-Data-Engineer Real Exam Questions: https://www.passexamdumps.com/Databricks-Certified-Professional-Data-Engineer-valid-exam-dumps.html
