US Airline Market Analysis

1 minute read

Description

  • Objective of the project is to analyse flight data for US for a period of 20 years(~128 million rows) to find patterns to gauge different choices of travel.
  • Identifying trends and patterns over the period of time.
  • Using Spark Dataframe and Spark SQL for handling large data efficiently and Amazon S3 as the data lake.

Technology Used

  • Big Data Analysis using Spark
  • Amazon Web Services

Environment

Python(PySpark), Spark SQL, Databricks, Amazon S3,Apache Parquet

Analysis: Notebook

Architecture


Architecture


Analysis 1 : Exploratory Data Analysis of Data:
analysis1

Analysis 2 : Impact of Global Recession:
analysis2

Analysis 3 : Fraud Data Analysis:
analysis3

Performance Optmization

  • Repartitioning of the dataframe with optimized number of partitions & Speeding up Shuffle.partitions.
  • Pulling data sets into a cluster-wide in-memory cache.
  • Using Apache Parquet,columnar storage format which support flexible compression options and also provides an efficient encoding system.

Updated: