Data-projects-with-R-and-GitHub

Laptop Price Analysis

Jingyi Li 2026-05-12

Topic:

Wrangling and Analyzing Laptop Market Data

Introduction

The project focuses on the exploratory data analysis (EDA) and price determinants of consumer laptops. Using a raw, uncleaned dataset containing approximately 1,300 laptop entries, this study aims to quantify how hardware specifications—such as CPU performance, RAM capacity, storage technology, and brand equity—impact the market pricing of portable computers.

Questions & Hypotheses

Price Distribution: What is the overall distribution of laptop prices in the dataset? Are there any significant outliers or clusters in the price data?
Hardware Specifications and Price: How do different hardware specifications (e.g., CPU performance, RAM size, storage type) correlate with laptop prices? Which specifications are the most influential in determining the price?
Brand Influence: Does the brand of the laptop have a significant impact on its price? Are certain brands consistently priced higher than others, and if so, why?

Hypothesis:

I hypothesize that higher-end hardware specifications (e.g., faster CPUs, larger RAM, SSD storage) and well-known brands will be associated with higher laptop prices. And high-resolution display features (e.g., IPS panels and Retina displays) contribute more to price variance in the Ultrabook segment than in the standard Notebook segment.

Data Source

Original source: https://www.kaggle.com/datasets/ehtishamsadiq/uncleaned-laptop-price-dataset

You can download the dataset directly using this link: uncleaned laptop price dataset (CSV)

Dataset Overview

The Uncleaned Laptop Price dataset is a collection of laptop product listings scraped from an online e-commerce website. The dataset includes information about various laptop models, such as their brand, screen size, processor, memory, storage capacity, operating system, and price. However, the dataset is uncleaned, meaning that it contains missing values, inconsistent formatting, and other errors that need to be addressed before the data can be used for analysis.

Company	TypeName	Inches	ScreenResolution	Cpu	Ram	Memory	Gpu	OpSys	Weight	Price
Apple	Ultrabook	13.3	IPS Panel Retina Display 2560x1600	Intel Core i5 2.3GHz	8GB	128GB SSD	Intel Iris Plus Graphics 640	macOS	1.37kg	71378.68
Apple	Ultrabook	13.3	1440x900	Intel Core i5 1.8GHz	8GB	128GB Flash Storage	Intel HD Graphics 6000	macOS	1.34kg	47895.52
HP	Notebook	15.6	Full HD 1920x1080	Intel Core i5 7200U 2.5GHz	8GB	256GB SSD	Intel HD Graphics 620	No OS	1.86kg	30636.00
Apple	Ultrabook	15.4	IPS Panel Retina Display 2880x1800	Intel Core i7 2.7GHz	16GB	512GB SSD	AMD Radeon Pro 455	macOS	1.83kg	135195.34
Apple	Ultrabook	13.3	IPS Panel Retina Display 2560x1600	Intel Core i5 3.1GHz	8GB	256GB SSD	Intel Iris Plus Graphics 650	macOS	1.37kg	96095.81
Acer	Notebook	15.6	1366x768	AMD A9-Series 9420 3GHz	4GB	500GB HDD	AMD Radeon R5	Windows 10	2.1kg	21312.00

Column Explanations

To ensure the description is self-contained, here is a short explanation of the core columns I will analyze:

Company: The brand of the laptop (e.g., Apple, HP, Acer, Asus).
TypeName: The category of the laptop (e.g., Notebook, Ultrabook, Gaming, Netbook).
Inches: The screen size in inches.
ScreenResolution: Text describing the resolution and panel type (e.g., IPS Panel Retina Display 2560x1600).
Cpu: The processor specification, including brand, model, and clock speed (e.g., Intel Core i5 2.3GHz).
Ram: The amount of memory (e.g., 8GB, 16GB).
Memory: The storage technology and capacity (e.g., 128GB SSD, 1TB HDD).
Gpu: The graphics card specification.
OpSys: The operating system (e.g., macOS, Windows 10, No OS).
Weight: The physical weight of the laptop (e.g., 1.37kg).
Price: The price of the laptop (originally in Indian Rupee, INR).

Data Manipulation Goals

Handling Blank Rows & Invalid Symbols (?):
- Remove Empty Rows: The dataset contains exactly 30 completely blank rows.
- Filter Invalid Strings: There are hidden non-numeric symbols ? in the data Rows.
Feature Extraction & Type Conversion (String to Numeric):
- Ram Column: Strip the “GB” text extension (e.g., converting “8GB” to “8”) and cast the column from character (chr) to integer (int).
- Weight Column: Strip the “kg” text extension (e.g., converting “1.37kg” to “1.37”) and cast the column to a numeric (dbl) format.
- Cpu Column: Extract the continuous numerical variable representing processor clock speed in GHz (e.g., parsing 2.3 out of "Intel Core i5 2.3GHz") .
Categorical Consolidation & Engineering:
- ScreenResolution Column: Create binary logical flags (is_IPS and is_Retina) based on text descriptions, and extract pure pixel dimensions (Width and Height) into separate numerical columns.
- OpSys Column: Group sparse categories into broader groups (e.g., combining different variants like “Windows 10”, “Windows 10 S”, and “Windows 7” into a unified “Windows” label, and grouping “Mac OS X” with “macOS”) to ensure clear and readable visual distributions.

Visualization Goals

Setting Price as the main variable, I will investigate the relations between price and other hardware specifications and categorizations through the following visualizations:

Price vs. Numeric Features (Scatter Plot)
- Goal: Investigate how continuous numerical variables like Ram (or extracted CPU clock speed, Weight) correlate with Price.
- Axes: Set Ram , CPU or Weight (numerical) on the X-axis and Price on the Y-axis.
- Overplotting: Apply semi-transparency to handle overlapping points, and overlay a shaded region or density contours using ggdensityor something else to observe where the bulk of the market lies.
Price vs. Categorical Features (Distribution Plot)
- Goal: Observe the price variance across discrete categories like Company (Brand) and TypeName (Laptop Type).
- Refinement (Violine over Boxplot): Instead of a simple bar chart or box plot, request a Violine Plot for each brand/type to show the full probability density and multi-modality of prices.
Price Distribution Overlap (Ridgeline Plot)
- Goal: Compare the overall price profile across the most common laptop types (Notebook, Ultrabook, Gaming).
- Specification: Plot a baseline price histogram for the entire dataset, and overlay a Ridgeline Plot split by TypeName right on top, allowing an immediate visual comparison of price peaks between standard notebooks and premium segments.