{"id":137,"date":"2026-03-02T15:18:47","date_gmt":"2026-03-02T10:18:47","guid":{"rendered":"https:\/\/gigz.pk\/python\/?post_type=lesson&#038;p=137"},"modified":"2026-03-17T07:57:29","modified_gmt":"2026-03-17T02:57:29","slug":"data-cleaning","status":"publish","type":"lesson","link":"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/","title":{"rendered":"Data Cleaning"},"content":{"rendered":"\n<p>Data Cleaning is the process of detecting and correcting errors, missing values, and inconsistencies in a dataset.<\/p>\n\n\n\n<p>In Data Analytics, clean data is essential for accurate analysis and decision-making.<\/p>\n\n\n\n<p>First, import pandas:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">import pandas as pd<\/pre>\n\n\n\n<p>Load a dataset:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df = pd.read_csv(\"data.csv\")<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Why Data Cleaning is Important<\/h2>\n\n\n\n<p>Real-world data often contains:<\/p>\n\n\n\n<p>Missing values<br>Duplicate records<br>Incorrect data types<br>Extra spaces<br>Inconsistent formatting<br>Outliers<\/p>\n\n\n\n<p>Cleaning ensures reliable analysis.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">1. Handling Missing Values<\/h2>\n\n\n\n<p>Check missing values:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df.isnull().sum()<\/pre>\n\n\n\n<p>Remove rows with missing values:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df.dropna()<\/pre>\n\n\n\n<p>Remove columns with missing values:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df.dropna(axis=1)<\/pre>\n\n\n\n<p>Fill missing values with a specific value:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df.fillna(0)<\/pre>\n\n\n\n<p>Fill with mean value:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df[\"Age\"].fillna(df[\"Age\"].mean())<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">2. Removing Duplicates<\/h2>\n\n\n\n<p>Check duplicate rows:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df.duplicated().sum()<\/pre>\n\n\n\n<p>Remove duplicates:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df.drop_duplicates()<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">3. Fixing Data Types<\/h2>\n\n\n\n<p>Check data types:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df.dtypes<\/pre>\n\n\n\n<p>Convert data type:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df[\"Age\"] = df[\"Age\"].astype(int)<\/pre>\n\n\n\n<p>Convert to datetime:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df[\"Date\"] = pd.to_datetime(df[\"Date\"])<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">4. Renaming Columns<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">df.rename(columns={\"OldName\": \"NewName\"}, inplace=True)<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">5. Removing Extra Spaces<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">df[\"Name\"] = df[\"Name\"].str.strip()<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">6. Replacing Values<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">df[\"Gender\"] = df[\"Gender\"].replace(\"M\", \"Male\")<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">7. Handling Outliers (Basic Method)<\/h2>\n\n\n\n<p>Using condition:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df[df[\"Salary\"] &lt; 100000]<\/pre>\n\n\n\n<p>Or remove extreme values:<\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">df = df[df[\"Salary\"] &lt; df[\"Salary\"].quantile(0.95)]<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">8. Changing Text Case<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">df[\"Name\"] = df[\"Name\"].str.lower()<br>df[\"Name\"] = df[\"Name\"].str.upper()<\/pre>\n\n\n\n<h2 class=\"wp-block-heading\">Data Cleaning Workflow<\/h2>\n\n\n\n<ol class=\"wp-block-list\">\n<li>Load dataset<\/li>\n\n\n\n<li>Check structure and summary<\/li>\n\n\n\n<li>Identify missing values<\/li>\n\n\n\n<li>Remove duplicates<\/li>\n\n\n\n<li>Fix data types<\/li>\n\n\n\n<li>Standardize formatting<\/li>\n\n\n\n<li>Handle outliers<\/li>\n<\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">Why Data Cleaning is Critical in Analytics<\/h2>\n\n\n\n<p>Accurate insights depend on clean data.<br>Poor data quality leads to wrong decisions.<br>Data cleaning is often 60\u201370% of a data analyst\u2019s work.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Key Takeaway<\/h2>\n\n\n\n<p>Data Cleaning is the foundation of Data Analytics.<br>Before analyzing or visualizing data, always clean and prepare it properly for accurate and meaningful results.<\/p>\n\n\n<div class=\"yoast-breadcrumbs\"><span><span><a href=\"https:\/\/gigz.pk\/python\/\">Home<\/a><\/span> \u00bb <span class=\"breadcrumb_last\" aria-current=\"page\">PYTHON FOR DATA ANALYTICS (PYDA) > Pandas > Data Cleaning<\/span><\/span><\/div>\n\n\n<div class=\"schema-faq wp-block-yoast-faq-block\"><div class=\"schema-faq-section\" id=\"faq-question-1773716359505\"><strong class=\"schema-faq-question\"><\/strong> <p class=\"schema-faq-answer\"><\/p> <\/div> <\/div>\n","protected":false},"menu_order":72,"template":"","class_list":["post-137","lesson","type-lesson","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Data Cleaning - One Language. Endless Possibilities<\/title>\n<meta name=\"description\" content=\"Learn data cleaning in Pandas: handle missing values, remove duplicates, fix data types, and prepare datasets for accurate analysis.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Data Cleaning - One Language. Endless Possibilities\" \/>\n<meta property=\"og:description\" content=\"Learn data cleaning in Pandas: handle missing values, remove duplicates, fix data types, and prepare datasets for accurate analysis.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/\" \/>\n<meta property=\"og:site_name\" content=\"One Language. Endless Possibilities\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-17T02:57:29+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"1 minute\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":[\"WebPage\",\"FAQPage\"],\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/data-cleaning\\\/\",\"url\":\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/data-cleaning\\\/\",\"name\":\"Data Cleaning - One Language. Endless Possibilities\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/#website\"},\"datePublished\":\"2026-03-02T10:18:47+00:00\",\"dateModified\":\"2026-03-17T02:57:29+00:00\",\"description\":\"Learn data cleaning in Pandas: handle missing values, remove duplicates, fix data types, and prepare datasets for accurate analysis.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/data-cleaning\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/data-cleaning\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/data-cleaning\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/gigz.pk\\\/python\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"PYTHON FOR DATA ANALYTICS (PYDA) > Pandas > Data Cleaning\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/#website\",\"url\":\"https:\\\/\\\/gigz.pk\\\/python\\\/\",\"name\":\"One Language. Endless Possibilities\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/gigz.pk\\\/python\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Data Cleaning - One Language. Endless Possibilities","description":"Learn data cleaning in Pandas: handle missing values, remove duplicates, fix data types, and prepare datasets for accurate analysis.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/","og_locale":"en_US","og_type":"article","og_title":"Data Cleaning - One Language. Endless Possibilities","og_description":"Learn data cleaning in Pandas: handle missing values, remove duplicates, fix data types, and prepare datasets for accurate analysis.","og_url":"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/","og_site_name":"One Language. Endless Possibilities","article_modified_time":"2026-03-17T02:57:29+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"1 minute"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["WebPage","FAQPage"],"@id":"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/","url":"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/","name":"Data Cleaning - One Language. Endless Possibilities","isPartOf":{"@id":"https:\/\/gigz.pk\/python\/#website"},"datePublished":"2026-03-02T10:18:47+00:00","dateModified":"2026-03-17T02:57:29+00:00","description":"Learn data cleaning in Pandas: handle missing values, remove duplicates, fix data types, and prepare datasets for accurate analysis.","breadcrumb":{"@id":"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/gigz.pk\/python\/lesson\/data-cleaning\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/gigz.pk\/python\/"},{"@type":"ListItem","position":2,"name":"PYTHON FOR DATA ANALYTICS (PYDA) > Pandas > Data Cleaning"}]},{"@type":"WebSite","@id":"https:\/\/gigz.pk\/python\/#website","url":"https:\/\/gigz.pk\/python\/","name":"One Language. Endless Possibilities","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/gigz.pk\/python\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/gigz.pk\/python\/wp-json\/wp\/v2\/lesson\/137","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gigz.pk\/python\/wp-json\/wp\/v2\/lesson"}],"about":[{"href":"https:\/\/gigz.pk\/python\/wp-json\/wp\/v2\/types\/lesson"}],"wp:attachment":[{"href":"https:\/\/gigz.pk\/python\/wp-json\/wp\/v2\/media?parent=137"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}