Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM
Address
304 North Cardinal St.
Dorchester Center, MA 02124
Work Hours
Monday to Friday: 7AM - 7PM
Weekend: 10AM - 5PM

To crack a data analyst interview in 2026, you must demonstrate strong SQL skills, Excel proficiency, statistics fundamentals, data visualization (Power BI/Tableau), and business problem-solving through case studies. Recruiters test technical ability plus your skill at translating messy data into business decisions. This guide covers 30+ real data analyst interview questions with model answers, a salary table, the exact interview structure, and an FAQ section optimized for quick revision.
Picture this: a retail company’s sales dipped 12% last quarter, and nobody knows why. The CEO doesn’t want opinions — she wants evidence. The person who walks into that boardroom with a clean dashboard, three validated hypotheses, and one clear recommendation is the data analyst. That’s the job. And that’s exactly what interviewers are testing when you sit across from them.
The beautiful thing about data analyst interviews? They follow a remarkably consistent pattern across companies — from MNCs like Deloitte, Accenture, and EY to product firms like Flipkart, Swiggy, and Zomato. The not-so-beautiful thing? Candidates routinely fail not on hard questions, but on fundamentals: a JOIN they couldn’t write, a p-value they couldn’t explain, a VLOOKUP scenario they fumbled.
This guide fixes that. Below are the 30 most asked data analyst interview questions and answers, organized by interview round, with tables, scenario walk-throughs, and links to deepen each skill area.
| Interview Round | What Is Tested | Weightage |
| Round 1 – Screening / Aptitude | Logical reasoning, basic Excel & SQL MCQs | 15% |
| Round 2 – SQL & Technical | Joins, aggregations, window functions | 35% |
| Round 3 – Case Study / Business | Metrics thinking, root-cause analysis | 30% |
| Round 4 – Tools & Visualization | Power BI/Tableau, dashboard logic | 10% |
| HR Round | Communication, projects, salary | 10% |
Since SQL alone carries the biggest weight, we strongly recommend revising our dedicated Basic SQL Interview Questions & Answers guide alongside this article.
Answer: A data analyst collects, cleans, analyzes, and visualizes data to help organizations make evidence-based decisions. The job has four repeating phases: (1) understanding the business question, (2) gathering and cleaning relevant data, (3) analyzing it using SQL, Excel, or Python, and (4) communicating insights through dashboards and reports. The strongest answers end with impact: “Ultimately, my job is to convert raw data into decisions that save money or grow revenue.”
Answer: Data analysis is examining historical data to answer “what happened and why” — descriptive and diagnostic. Data analytics is the broader umbrella including processes, tools, and governance around analysis. Data science extends into predictive and prescriptive territory using machine learning — “what will happen and what should we do.” A data analyst reports the past accurately; a data scientist models the future probabilistically.
Answer: A structured model answer: (1) Profile the data — row counts, column types, summary statistics; (2) Handle missing values — decide between deletion, mean/median imputation, or business-rule filling, depending on why data is missing; (3) Remove duplicates after defining what makes a row unique; (4) Fix inconsistencies — date formats, “Mumbai” vs “MUMBAI”, trailing spaces; (5) Treat outliers — investigate before deleting, since outliers are sometimes the insight; (6) Validate — totals reconcile with source systems. Data cleaning consumes 60–80% of an analyst’s real time, and interviewers know it.
Answer: Structured data fits neatly into rows and columns with a fixed schema — sales tables, customer databases — and is queried with SQL. Unstructured data has no predefined format — emails, social media posts, images, call recordings — and requires specialized processing (text mining, NLP). Semi-structured data (JSON, XML) sits in between with flexible tags. Roughly 80% of enterprise data is unstructured, which is why this distinction matters.
Answer: The interview-favorite list: VLOOKUP/XLOOKUP (joining data across sheets), INDEX-MATCH (flexible lookups), SUMIFS/COUNTIFS/AVERAGEIFS (conditional aggregation), IF with AND/OR (logic), TEXT functions (LEFT, RIGHT, TRIM, CONCAT for cleaning), Pivot Tables (instant summarization), and conditional formatting. In 2026, also mention Power Query for repeatable cleaning workflows — it signals you work like a professional, not a student.
Answer:
| Function | Lookup Direction | Speed on Big Data | Column Insert Safe? |
| VLOOKUP | Left to right only | Slower | No (breaks on insert) |
| INDEX-MATCH | Any direction | Faster | Yes |
| XLOOKUP | Any direction | Fast | Yes |
Model answer: “XLOOKUP where available — it handles both directions, defaults to exact match, and has built-in if-not-found. INDEX-MATCH as the robust fallback on older Excel versions.”
Answer: Mean (average) — distorted by outliers: one billionaire in a room skews average income; Median (middle value) — robust to outliers, ideal for salary and house-price data; Mode (most frequent) — best for categorical data like most-sold product size. Strong candidates volunteer the misleading cases without being asked.
Answer: Correlation means two variables move together; causation means one drives the other. Ice cream sales and drowning deaths correlate — both are caused by summer, not each other (a confounding variable). Analysts must resist causal language unless supported by controlled experiments like A/B tests. This question tests analytical maturity more than statistics.
Answer: Data validation ensures data meets quality rules before analysis — type checks (dates are dates), range checks (age between 0–120), uniqueness checks (no duplicate order IDs), referential checks (every order links to a real customer), and cross-field logic (delivery date ≥ order date). One unvalidated dataset can silently corrupt every dashboard built on it — a story worth telling if you have one.
Answer: SQL (non-negotiable), Excel/Google Sheets (still everywhere), Power BI or Tableau (visualization), Python with Pandas (increasingly expected), and cloud data warehouses (BigQuery, Snowflake, Redshift) for large datasets. Familiarity with AI-assisted analysis tools is becoming a plus. If cloud platforms appear in the job description, skim our Cloud Computing Interview Questions guide too.
Answer: INNER JOIN returns only matching rows in both tables; LEFT JOIN returns all left-table rows with NULLs where the right side has no match; RIGHT JOIN mirrors that; FULL OUTER JOIN returns everything from both sides; CROSS JOIN returns the Cartesian product; SELF JOIN joins a table to itself (employees to their managers). Be ready to predict row counts: “1,000 customers LEFT JOIN 800 orders gives at least 1,000 rows” — counting questions trip up most candidates.
Answer: WHERE filters individual rows before grouping; HAVING filters groups after GROUP BY aggregation. You cannot use aggregate functions in WHERE. Example: “show cities with more than 100 orders” requires GROUP BY city HAVING COUNT(*) > 100. Execution order to memorize: FROM → WHERE → GROUP BY → HAVING → SELECT → ORDER BY → LIMIT.
Answer: Group by the columns defining a duplicate and filter: SELECT email, COUNT(*) FROM customers GROUP BY email HAVING COUNT(*) > 1; To delete duplicates while keeping one, use a window function: assign ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at) and delete where the row number exceeds 1. This two-part answer (find + remove) is what separates good from average candidates.
Answer: Window functions compute values across related rows without collapsing them like GROUP BY does. Essentials: ROW_NUMBER(), RANK(), DENSE_RANK() for ranking; LAG()/LEAD() for comparing with previous/next rows (month-over-month growth); SUM() OVER (PARTITION BY … ORDER BY …) for running totals. The classic test: “Find the second-highest salary per department” — DENSE_RANK() OVER (PARTITION BY dept ORDER BY salary DESC) filtered to rank 2.
Answer: Given salaries 100, 90, 90, 80: ROW_NUMBER gives 1,2,3,4 (arbitrary tie order); RANK gives 1,2,2,4 (skips after ties); DENSE_RANK gives 1,2,2,3 (no gaps). Choose based on how the business defines “Nth highest” — and say so, because asking the clarifying question is itself a strong signal.
Answer: A subquery is a query nested inside another; a CTE (Common Table Expression, WITH cte AS (…)) is a named temporary result set defined before the main query. CTEs win for readability, reuse within the same statement, and breaking complex logic into steps — and they enable recursion. Model preference: “CTEs for anything beyond one level of nesting; my queries read like a story: clean → aggregate → rank → select.”
Answer: Aggregate revenue by month in a CTE, then use LAG(revenue) OVER (ORDER BY month) to fetch the previous month and compute (revenue – prev_revenue) * 100.0 / prev_revenue. Mention edge cases — the first month has no prior (NULL handling) and division-by-zero protection — because production-minded answers win offers. Practice variants of this freely on Mode’s SQL tutorial or LeetCode’s database section.
Answer: DELETE removes selected rows (WHERE-able, logged, rollback-able, triggers fire); TRUNCATE removes all rows instantly (minimal logging, resets identity, no WHERE); DROP removes the entire table structure itself. Quick mnemonic: DELETE = erase some writing, TRUNCATE = tear out all pages, DROP = burn the notebook.
Answer: The p-value is the probability of observing results at least this extreme if the null hypothesis were true. A p-value of 0.03 means: “If there were truly no effect, we’d see data like this only 3% of the time.” Below the significance threshold (commonly 0.05), we reject the null. Crucially, p-value is not the probability the hypothesis is true — stating this distinction earns instant credibility.
Answer: A/B testing compares two versions (control vs variant) by randomly splitting users and measuring a success metric. Design steps: (1) define one primary metric (e.g., checkout conversion); (2) state hypotheses; (3) calculate required sample size for desired statistical power; (4) randomize properly and run for full business cycles (avoid weekday bias); (5) analyze significance and — importantly — practical significance: a statistically significant 0.01% lift may not justify engineering cost.
Answer: Type I error (false positive) — rejecting a true null hypothesis: declaring the new design better when it isn’t. Type II error (false negative) — failing to detect a real effect. Memory trick: Type I is “crying wolf,” Type II is “missing the wolf.” The significance level α controls Type I risk; statistical power (1−β) controls Type II.
Answer: First detect (boxplots/IQR rule: outside Q1−1.5×IQR or Q3+1.5×IQR; or z-scores beyond ±3), then investigate before acting — is it a data-entry error, a system glitch, or a genuine extreme like a Diwali sales spike? Errors get corrected or removed; genuine extremes stay (perhaps analyzed separately or handled with median-based metrics). “I never delete outliers blindly” is the sentence interviewers wait for.
Answer:
| Goal | Best Chart |
| Trend over time | Line chart |
| Compare categories | Bar chart |
| Part-to-whole (few segments) | Pie/donut (≤5 slices) or stacked bar |
| Relationship between 2 numeric variables | Scatter plot |
| Distribution of one variable | Histogram / box plot |
| Geographic comparison | Filled map |
Bonus point: “I avoid pie charts beyond five categories and never use 3D charts — they distort perception.”
Answer: (1) Built around the audience’s decisions, not available data; (2) KPI summary on top, drill-downs below (inverted pyramid); (3) five-second rule — the headline insight should be graspable in five seconds; (4) consistent colors with meaning (red = bad everywhere); (5) interactive filters instead of fifty static charts; (6) performance-optimized data models (star schema, measures over calculated columns in Power BI). Walk the interviewer through one dashboard you actually built.
Answer: A fact table stores measurable events — sales transactions with amounts, quantities, and foreign keys. Dimension tables store descriptive context — customer details, product attributes, dates. Together they form a star schema, the standard model for analytics because it keeps queries fast and logic simple. Knowing this signals you can work with real data warehouses, not just flat Excel files.
Answer: Use a structured funnel: (1) Validate the data — tracking bug? definition change? (most “drops” are instrumentation issues); (2) Segment — by platform, geography, user cohort, acquisition channel — is the drop universal or isolated?; (3) Check internal changes — releases, pricing changes, feature removals on the timeline; (4) Check external factors — seasonality, holidays, competitor launches, outages; (5) Quantify and recommend. Interviewers grade the structure, not a lucky guess.
Answer: Define metrics across layers: adoption (% of users trying it), engagement (frequency/depth of use), retention impact (do adopters return more?), business impact (revenue, conversion), and guardrails (did support tickets or churn rise?). Tie each metric to a target before launch — measuring after the fact invites bias.
Answer: The data analyst analyzes data to answer business questions (SQL, BI tools, statistics). The business analyst focuses on processes and requirements, bridging stakeholders and tech teams (less hands-on with data). The data engineer builds the pipelines and warehouses that deliver reliable data to analysts (heavy coding, ETL, infrastructure). Career-wise, analysts commonly grow into senior analyst, analytics manager, data scientist, or product analyst paths — a question that often follows in managerial rounds, much like those covered in our Product Manager Interview Questions guide.
Answer: Increasingly yes, at least Pandas basics: reading files, filtering, groupby aggregations, merges, and simple plots. Python takes over where Excel struggles (millions of rows, repeatable pipelines, API data pulls). A balanced 2026 answer: “SQL is my primary tool, Excel for stakeholder collaboration, Python for automation and heavy lifting.”
Answer: Use the STAR format: Situation (declining repeat purchases at an e-commerce client), Task (find the cause), Action (cohort analysis in SQL revealed users acquired via one discount campaign churned 3× faster; built a Power BI dashboard to track cohort quality), Result (“marketing reallocated 20% of budget; repeat purchase rate rose 8% in two quarters”). Always quantify the result — numbers are an analyst’s native language, and your interview story should prove it.
| Experience | Role | Average Annual Salary (INR) |
| 0–2 years | Junior / Associate Data Analyst | ₹4 – 7 LPA |
| 2–5 years | Data Analyst | ₹7 – 14 LPA |
| 5–8 years | Senior Analyst / Analytics Lead | ₹14 – 25 LPA |
| 8+ years | Analytics Manager | ₹25 – 45+ LPA |
Indicative ranges; product companies and fintech typically pay 20–40% above services firms.
For ongoing practice, the Kaggle datasets library and Google’s free Data Analytics resources are excellent companions.
| Resource | Best For | Link |
| Google Data Analytics Certificate (Coursera) | Structured beginner path | [Enroll Now → Affiliate Link] |
| “Storytelling with Data” (Book – Cole Knaflic) | Visualization & communication | [Check Price on Amazon → Affiliate Link] |
| SQL for Data Analysis (Udemy) | Interview-grade SQL practice | [Enroll Now → Affiliate Link] |
| Power BI Complete Course | Dashboard skills | [Enroll Now → Affiliate Link] |
Affiliate Disclosure: Some of the links above are affiliate links, meaning Interview Questions Hub may earn a small commission if you purchase through them — at no extra cost to you. We only recommend resources we genuinely believe accelerate your preparation.
Q1. Can a fresher become a data analyst without a computer science degree? Absolutely. Data analytics is one of the most degree-flexible tech careers — commerce, statistics, and engineering graduates all succeed. What matters is demonstrable SQL + Excel + visualization skill, proven through portfolio projects.
Q2. Is coding required for data analyst jobs? SQL is essential (and yes, it counts as coding). Python is increasingly preferred but rarely mandatory for entry roles. Heavy software engineering is not required — that’s the data engineer’s domain.
Q3. How long does it take to prepare for data analyst interviews? From a basic foundation, 8–12 weeks of structured preparation: four weeks SQL, two weeks Excel + statistics, two weeks Power BI/Tableau with a portfolio project, and two weeks of case study and mock interview practice.
Q4. Which certification helps most for data analyst roles? Certificates open screening doors but don’t replace skills. The Google Data Analytics Certificate and Microsoft PL-300 (Power BI) are the most recognized in 2026 — pair either with two strong portfolio projects.
Q5. What is the most common reason candidates fail data analyst interviews? Weak SQL under pressure, followed by unstructured case-study answers. Both are fixable with deliberate practice — which is precisely what this guide is designed for.
Data analyst interviews reward structured thinkers who can write clean SQL, question their own data, and translate numbers into decisions. Master the 30 questions above, build the two portfolio projects, and practice narrating your analysis out loud. Combine this guide with our SQL interview questions deep-dive and HR round preparation, and you’ll walk into your interview with the one thing every analyst needs most: confidence backed by evidence. Your dashboard-driven career starts now!
Disclaimer: The interview questions, answers, salary figures, and preparation timelines in this article are compiled for educational purposes from commonly reported interview experiences and public market data. Actual interview content, processes, and compensation vary by company, role, and location. Interview Questions Hub does not guarantee job placement or specific outcomes. All trademarks and product names belong to their respective owners.