Data Engineer Interview Questions

Data Engineer Interview Questions

I Data engineer sono professionisti informatici richiesti pressoché in tutti i settori. Si occupano di monitorare i trend dei dati per pianificare le azioni più adeguate che un'azienda deve intraprendere. Uno degli aspetti più critici del lavoro di un Data engineer è l'elaborazione dei dati grezzi e la loro trasformazione in dati utilizzabili per creare pipeline e sistemi di dati.

Domande tipiche dei colloqui per Data engineer e come rispondere

Question 1

Domanda 1: Puoi descrivere in dettaglio il tuo livello di competenza nell'ambito dei linguaggi di programmazione?

How to answer
Come rispondere: Prima del colloquio, ripassa il tuo CV e/o portfolio e stila un elenco dei programmi che conosci meglio. Se scopri di non avere una buona conoscenza del programma usato in prevalenza nell'azienda, descrivi te stesso come una persona intraprendente e altamente motivata, che si impegnerà senza sosta per imparare a usare il programma.
Question 2

Domanda 2: Spiega a parole tue che cos’è il data engineering.

How to answer
Come rispondere: Analizza il tuo ruolo in relazione all'azienda e ad altri ruoli quali il data scientist, così da definire in modo chiaro il tuo contributo al sistema aziendale nel suo complesso. Spiega la differenza tra il ruolo di un ingegnere che lavora ai database e quello di un ingegnere che si occupa di pipeline.
Question 3

Domanda 3: Puoi descrivere un'esperienza lavorativa con Apache Hadoop e in ambienti di gestione dei dati nel cloud?

How to answer
Come rispondere: Per prepararti a questa domanda, fai le dovute ricerche sul software utilizzato dall'azienda, sui prodotti cloud per i dati e sull'uso di Apache Hadoop. I Data engineer devono avere un'ottima padronanza dei linguaggi di programmazione e dei sistemi di gestione dei dati utilizzati nel settore, quali Apache Hadoop.

20,133 data engineer interview questions shared by candidates

# # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg | TINYINT | # | | units_sold | DECIMAL | | | gross_weight | DECIMAL | # | | transaction_date | DATE | | | net_weight | DECIMAL | # | +------------------+---------+ | +---------------------+---------+ # | | # | # promotions | # product_classes # | +------------------+---------+ | +---------------------+---------+ # +----| promotion_id | INTEGER | +----| product_class_id | INTEGER | # | promotion_name | VARCHAR | | product_subcategory | VARCHAR | # | media_type | VARCHAR | | product_category | VARCHAR | # | cost | DECIMAL | | product_department | VARCHAR | # | start_date | DATE | | product_family | VARCHAR | # | end_date | DATE | +---------------------+---------+ # +------------------+---------+ # */ # Question 1: # -- What percent of all products in the grocery chain's catalog # -- are both low fat and recyclable? #
avatar

Data Engineer

Interviewed at Meta

3.6
Jun 8, 2020

# # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg | TINYINT | # | | units_sold | DECIMAL | | | gross_weight | DECIMAL | # | | transaction_date | DATE | | | net_weight | DECIMAL | # | +------------------+---------+ | +---------------------+---------+ # | | # | # promotions | # product_classes # | +------------------+---------+ | +---------------------+---------+ # +----| promotion_id | INTEGER | +----| product_class_id | INTEGER | # | promotion_name | VARCHAR | | product_subcategory | VARCHAR | # | media_type | VARCHAR | | product_category | VARCHAR | # | cost | DECIMAL | | product_department | VARCHAR | # | start_date | DATE | | product_family | VARCHAR | # | end_date | DATE | +---------------------+---------+ # +------------------+---------+ # */ # Question 1: # -- What percent of all products in the grocery chain's catalog # -- are both low fat and recyclable? #

first round - written: 3 sql and one about what will you do to improve the fastness of an insert on a huge table second round - get the players with highest streak get the employee details who has maximum members in a team. python-return the numbers which have maximum count in a list round 3: behavioral questions and 1 question on python lists. from the 2 lists get the numbers that are common , and return the numbers in the following way. [1,2,3,3,1,1,1],[1,1,2,2,3] - return [1,1,2,3]
avatar

Data Engineer

Interviewed at Amazon

3.5
Apr 8, 2021

first round - written: 3 sql and one about what will you do to improve the fastness of an insert on a huge table second round - get the players with highest streak get the employee details who has maximum members in a team. python-return the numbers which have maximum count in a list round 3: behavioral questions and 1 question on python lists. from the 2 lists get the numbers that are common , and return the numbers in the following way. [1,2,3,3,1,1,1],[1,1,2,2,3] - return [1,1,2,3]

python question: given a two dimensional list for example [ [2,3],[3,4],[5]] person 2 is friends with 3 etc. find how many friends does each person has. note one person has no friends. SQL question: find the top 10 college/company that a average social person interacts with. something in those lines. I split the query in two. Not able to finish coding but was able to explain and write both the parts but didn't have time to test it. also had data modeling questions. on a social network website. cant give details.
avatar

Data Engineer

Interviewed at Meta

3.6
Nov 16, 2020

python question: given a two dimensional list for example [ [2,3],[3,4],[5]] person 2 is friends with 3 etc. find how many friends does each person has. note one person has no friends. SQL question: find the top 10 college/company that a average social person interacts with. something in those lines. I split the query in two. Not able to finish coding but was able to explain and write both the parts but didn't have time to test it. also had data modeling questions. on a social network website. cant give details.

Python 1 #1.returns the number of times a given character occurs in the given string s1='missisipi' #print(s1.find('s')) res=[] for i in range(len(s1)): #print(s1[i]) if s1[i]=='s': res.append('s') print(len(res)) #2.[1,None,1,2,None} --> [1,1,1,2,2] arr=[None,1,2,None] new_l=[] for i in range(0,len(arr)): if arr[i] != None: new_l.append(arr[i]) else: new_l.append(arr[i-1]) print(new_l) #2. (python) Given two sentences, construct an array that has the words that appear in one sentence and not the other. A = "Geeks for Geeks" B = "Learning from Geeks for Geeks" d={} for w in A.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 for w in B.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 unmatchedW=[w for w in d if d[w]==1] print (unmatchedW) 3. d = {"a": 4, "c": 3, "b": 12} [(k, v) for k, v in sorted(d.items(), key=lambda x: x[1], reverse=True)] #[('b', 12), ('a', 4), ('c', 3)] SQL # # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg |… Show More 1. find top 5 sales products having promotions Select Sum(s.store_sales), brand_name, count(p.product_id) from products p inner join sales s p.product_id = s.product_id where promotion_id is not null group by brand_name having count(p.product_id) =1 /* single-channel media type */ order by 1 desc limit 5 2. # -- % Of sales that had a valid promotion, the VP of marketing # -- wants to know what % of transactions occur on either # -- the very first day or the very last day of a promotion campaign. select sum(case when valid_promotion = 1 then 1 else 0 end)/count(*) * 100 as percentage from sales where day = First_day(date) or day = last_day(date) or select sum(case when transaction_date = (select min(transaction_date) from sales) then 1 else 0)/count(*) as first_day_sales, sum(case when transaction_date = (select max(transaction_date) from sales) then 1 else 0)/count(*) as last_day_sales from sales or select avg(transaction_date in (p.start_date,p.end_date))*100 as first_last_pct from sales s join promotions p using(promotion_id)
avatar

Data Engineer

Interviewed at Meta

3.6
Aug 25, 2020

Python 1 #1.returns the number of times a given character occurs in the given string s1='missisipi' #print(s1.find('s')) res=[] for i in range(len(s1)): #print(s1[i]) if s1[i]=='s': res.append('s') print(len(res)) #2.[1,None,1,2,None} --> [1,1,1,2,2] arr=[None,1,2,None] new_l=[] for i in range(0,len(arr)): if arr[i] != None: new_l.append(arr[i]) else: new_l.append(arr[i-1]) print(new_l) #2. (python) Given two sentences, construct an array that has the words that appear in one sentence and not the other. A = "Geeks for Geeks" B = "Learning from Geeks for Geeks" d={} for w in A.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 for w in B.split(): if w in d: d[w]=d.get(w,0)+1 else: d[w]=1 unmatchedW=[w for w in d if d[w]==1] print (unmatchedW) 3. d = {"a": 4, "c": 3, "b": 12} [(k, v) for k, v in sorted(d.items(), key=lambda x: x[1], reverse=True)] #[('b', 12), ('a', 4), ('c', 3)] SQL # # sales # products # +------------------+---------+ +---------------------+---------+ # | product_id | INTEGER |>--------| product_id | INTEGER | # | store_id | INTEGER | +---<| product_class_id | INTEGER | # | customer_id | INTEGER | | | brand_name | VARCHAR | # +---<| promotion_id | INTEGER | | | product_name | VARCHAR | # | | store_sales | DECIMAL | | | is_low_fat_flg | TINYINT | # | | store_cost | DECIMAL | | | is_recyclable_flg |… Show More 1. find top 5 sales products having promotions Select Sum(s.store_sales), brand_name, count(p.product_id) from products p inner join sales s p.product_id = s.product_id where promotion_id is not null group by brand_name having count(p.product_id) =1 /* single-channel media type */ order by 1 desc limit 5 2. # -- % Of sales that had a valid promotion, the VP of marketing # -- wants to know what % of transactions occur on either # -- the very first day or the very last day of a promotion campaign. select sum(case when valid_promotion = 1 then 1 else 0 end)/count(*) * 100 as percentage from sales where day = First_day(date) or day = last_day(date) or select sum(case when transaction_date = (select min(transaction_date) from sales) then 1 else 0)/count(*) as first_day_sales, sum(case when transaction_date = (select max(transaction_date) from sales) then 1 else 0)/count(*) as last_day_sales from sales or select avg(transaction_date in (p.start_date,p.end_date))*100 as first_last_pct from sales s join promotions p using(promotion_id)

Viewing 31 - 40 interview questions

Glassdoor has 20,133 interview questions and reports from Data engineer interviews. Prepare for your interview. Get hired. Love your job.