SQL join and window function, spark, adf
Sr Data Engineer Interview Questions
2,565 sr data engineer interview questions shared by candidates
1 What are the challenges faced by distributed system and how to resolve them in most efficient way
They asked SQL and Python questions, the questions were challenging but would have been a fair play if we were given a code playground to test our queries and then submit the final answer.
A. Core Data Engineering Concepts SQL (joins, window functions, performance tuning) Data Modeling (star vs snowflake, normalization) ETL/ELT pipelines (batch vs streaming, orchestration tools like Airflow) B. Apache Spark / PySpark Catalyst Optimizer & Tungsten Narrow vs Wide transformations Joins (broadcast, sort-merge), Skew handling AQE (Adaptive Query Execution) Partitioning, Predicate Pushdown Execution Plan (DAG → Stage → Tasks) Spark UI and Job Debugging SCD Type 2 Implementation in PySpark C. AWS S3, Glue, Athena, Lambda, EMR, Redshift Event-driven design (S3 → EventBridge → Lambda) Security: IAM roles, bucket policies, encryption CI/CD in AWS (CodePipeline, CloudFormation) D. Python Writing modular, reusable code Working with Pandas, Boto3 (for AWS interaction) Exception handling, logging Lambda functions and decorators E. Kafka / Streaming Kafka topic partitioning, consumer groups Offset management Integration with Spark Structured Streaming
Pyspark memory optimization, different types of keys in SQL
Linked list reversal with pointers.
Explain the difference between dataset and dataframe Spark, cluster, jobs, optimization Pyspark, scala Real-time project based questions
About project
Data architecture, work experience, behavioral questions
Around 50 theorical questions related to Snowflake(theory),ADF, data modeling , CI/CD, Scrum and leadership skills, sql , python and spark basics. 2 simple live coding exercises Sql with making query(with window functions) Python to extract the number of fruits in a list
Viewing 1631 - 1640 interview questions