Jingyi Li 2026-05-12
Wrangling and Analyzing Laptop Market Data
The project focuses on the exploratory data analysis (EDA) and price determinants of consumer laptops. Using a raw, uncleaned dataset containing approximately 1,300 laptop entries, this study aims to quantify how hardware specifications—such as CPU performance, RAM capacity, storage technology, and brand equity—impact the market pricing of portable computers.
I hypothesize that higher-end hardware specifications (e.g., faster CPUs, larger RAM, SSD storage) and well-known brands will be associated with higher laptop prices. And high-resolution display features (e.g., IPS panels and Retina displays) contribute more to price variance in the Ultrabook segment than in the standard Notebook segment.
Original source: https://www.kaggle.com/datasets/ehtishamsadiq/uncleaned-laptop-price-dataset
You can download the dataset directly using this link: uncleaned laptop price dataset (CSV)
The Uncleaned Laptop Price dataset is a collection of laptop product listings scraped from an online e-commerce website. The dataset includes information about various laptop models, such as their brand, screen size, processor, memory, storage capacity, operating system, and price. However, the dataset is uncleaned, meaning that it contains missing values, inconsistent formatting, and other errors that need to be addressed before the data can be used for analysis.
| Company | TypeName | Inches | ScreenResolution | Cpu | Ram | Memory | Gpu | OpSys | Weight | Price |
|---|---|---|---|---|---|---|---|---|---|---|
| Apple | Ultrabook | 13.3 | IPS Panel Retina Display 2560x1600 | Intel Core i5 2.3GHz | 8GB | 128GB SSD | Intel Iris Plus Graphics 640 | macOS | 1.37kg | 71378.68 |
| Apple | Ultrabook | 13.3 | 1440x900 | Intel Core i5 1.8GHz | 8GB | 128GB Flash Storage | Intel HD Graphics 6000 | macOS | 1.34kg | 47895.52 |
| HP | Notebook | 15.6 | Full HD 1920x1080 | Intel Core i5 7200U 2.5GHz | 8GB | 256GB SSD | Intel HD Graphics 620 | No OS | 1.86kg | 30636.00 |
| Apple | Ultrabook | 15.4 | IPS Panel Retina Display 2880x1800 | Intel Core i7 2.7GHz | 16GB | 512GB SSD | AMD Radeon Pro 455 | macOS | 1.83kg | 135195.34 |
| Apple | Ultrabook | 13.3 | IPS Panel Retina Display 2560x1600 | Intel Core i5 3.1GHz | 8GB | 256GB SSD | Intel Iris Plus Graphics 650 | macOS | 1.37kg | 96095.81 |
| Acer | Notebook | 15.6 | 1366x768 | AMD A9-Series 9420 3GHz | 4GB | 500GB HDD | AMD Radeon R5 | Windows 10 | 2.1kg | 21312.00 |
To ensure the description is self-contained, here is a short explanation of the core columns I will analyze:
?):
?
in the data Rows.Ram Column: Strip the “GB” text extension (e.g., converting
“8GB” to “8”) and cast the column from character (chr) to
integer (int).Weight Column: Strip the “kg” text extension (e.g., converting
“1.37kg” to “1.37”) and cast the column to a numeric (dbl)
format.Cpu Column: Extract the continuous numerical variable
representing processor clock speed in GHz (e.g., parsing 2.3 out
of "Intel Core i5 2.3GHz") .ScreenResolution Column: Create binary logical flags (is_IPS
and is_Retina) based on text descriptions, and extract pure
pixel dimensions (Width and Height) into separate numerical
columns.OpSys Column: Group sparse categories into broader groups (e.g.,
combining different variants like “Windows 10”, “Windows 10 S”,
and “Windows 7” into a unified “Windows” label, and grouping “Mac
OS X” with “macOS”) to ensure clear and readable visual
distributions.Setting Price as the main variable, I will investigate the relations between price and other hardware specifications and categorizations through the following visualizations:
Ram (or extracted CPU clock speed, Weight) correlate with
Price.Ram , CPU or Weight (numerical) on the X-axis
and Price on the Y-axis.ggdensityor something else to observe where the bulk of the
market lies.Company (Brand) and TypeName (Laptop Type).Notebook, Ultrabook, Gaming).TypeName
right on top, allowing an immediate visual comparison of price
peaks between standard notebooks and premium segments.