difference between nested query and inner query
Etl Engineer Interview Questions
1,581 etl engineer interview questions shared by candidates
The coding task checked some basic knowledge about using external APIs.
Nothing very difficult to answer
Most are knowledge based Java and database questions
Difference between Pandas Dataframe and Numpy Araay?
SQL Challenges: a. Given two tables, "Sales" and "Customers," write a SQL query to calculate the total sales amount for each customer in the last quarter of the year, considering only customers who have made at least three purchases during that period. b. Write a SQL query to identify the top 10% of products with the highest sales revenue in the "Products" table, excluding products that have been discontinued. Python Coding Challenge: You have been provided with two CSV files: "employees.csv" and "departments.csv." The "employees.csv" file contains information about employees, including their names, salaries, department IDs, and hire dates. The "departments.csv" file contains department IDs and their respective names. Write a Python script to perform the following tasks: a. Calculate the average salary for each department and store the results in a new CSV file named "average_salary_per_department.csv." b. Identify the department with the highest average salary and print its name along with the value. c. Determine the number of employees hired in each year and output the results in descending order. ETL Workflow Design: Describe the end-to-end workflow you would design for an ETL process that involves extracting data from a real-time streaming source, transforming the data to fit a specific data model, and loading it into a data warehouse. Consider handling late-arriving data, data consistency, and error logging in your design. Data Quality and Validation: Explain how you would ensure data quality during the ETL process. What techniques and checks would you implement to identify and handle anomalies, missing values, and inconsistencies in the data? Performance Optimization: Discuss strategies for optimizing the performance of an ETL pipeline when dealing with large volumes of data. How would you approach parallel processing, partitioning, and indexing to enhance the overall performance?
SQL,ssis questions
Flow of project and flow of etl testing.
A table having data for employee joining date and end date. Find the date differents.
Aggregate functions, SCD 2, Star Schema,Joins, Informatica power center
Viewing 961 - 970 interview questions