{"id":216,"date":"2026-03-03T13:58:51","date_gmt":"2026-03-03T08:58:51","guid":{"rendered":"https:\/\/gigz.pk\/python\/?post_type=lesson&#038;p=216"},"modified":"2026-03-23T21:30:39","modified_gmt":"2026-03-23T16:30:39","slug":"transformations-and-actions","status":"publish","type":"lesson","link":"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/","title":{"rendered":"Transformations and Actions"},"content":{"rendered":"\n<p>In PySpark, understanding <strong>Transformations<\/strong> and <strong>Actions<\/strong> is very important because Spark works on a concept called lazy evaluation.<\/p>\n\n\n\n<p>PySpark is built on Apache Spark, which processes data only when necessary.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What are Transformations?<\/h2>\n\n\n\n<p>Transformations are operations that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Create a new DataFrame or RDD<\/li>\n\n\n\n<li>Do NOT execute immediately<\/li>\n\n\n\n<li>Are lazily evaluated<\/li>\n<\/ul>\n\n\n\n<p>Spark simply records the transformation steps but does not run them until an action is called.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common Transformations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>select()<\/code><\/li>\n\n\n\n<li><code>filter()<\/code><\/li>\n\n\n\n<li><code>groupBy()<\/code><\/li>\n\n\n\n<li><code>withColumn()<\/code><\/li>\n\n\n\n<li><code>drop()<\/code><\/li>\n\n\n\n<li><code>join()<\/code><\/li>\n\n\n\n<li><code>orderBy()<\/code><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example of Transformation<\/h3>\n\n\n\n<pre class=\"wp-block-preformatted\">df_filtered = df.filter(df.salary &gt; 50000)<\/pre>\n\n\n\n<p>At this stage, Spark does NOT process the data.<br>It only remembers the instruction.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What are Actions?<\/h2>\n\n\n\n<p>Actions are operations that:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Trigger execution<\/li>\n\n\n\n<li>Return results<\/li>\n\n\n\n<li>Perform computation<\/li>\n<\/ul>\n\n\n\n<p>When an action is called, Spark executes all previous transformations.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Common Actions<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>show()<\/code><\/li>\n\n\n\n<li><code>count()<\/code><\/li>\n\n\n\n<li><code>collect()<\/code><\/li>\n\n\n\n<li><code>first()<\/code><\/li>\n\n\n\n<li><code>take()<\/code><\/li>\n\n\n\n<li><code>write()<\/code><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Example of Action<\/h3>\n\n\n\n<pre class=\"wp-block-preformatted\">df_filtered.show()<\/pre>\n\n\n\n<p>Now Spark processes the data and displays results.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Lazy Evaluation (Important Concept)<\/h2>\n\n\n\n<p>Spark follows lazy evaluation:<\/p>\n\n\n\n<p>Transformations \u2192 Stored in DAG \u2192 Action Called \u2192 Execution Starts<\/p>\n\n\n\n<p>Spark builds a logical execution plan (DAG \u2013 Directed Acyclic Graph) and optimizes it before running.<\/p>\n\n\n\n<p>This makes Spark:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Faster<\/li>\n\n\n\n<li>More efficient<\/li>\n\n\n\n<li>Optimized automatically<\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Types of Transformations<\/h2>\n\n\n\n<h3 class=\"wp-block-heading\">Narrow Transformations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data remains in the same partition<\/li>\n\n\n\n<li>No shuffling required<\/li>\n\n\n\n<li>Faster<\/li>\n<\/ul>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>select()<\/code><\/li>\n\n\n\n<li><code>filter()<\/code><\/li>\n<\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">Wide Transformations<\/h3>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Data moves across partitions<\/li>\n\n\n\n<li>Shuffling occurs<\/li>\n\n\n\n<li>Slower than narrow transformations<\/li>\n<\/ul>\n\n\n\n<p>Examples:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><code>groupBy()<\/code><\/li>\n\n\n\n<li><code>join()<\/code><\/li>\n\n\n\n<li><code>distinct()<\/code><\/li>\n<\/ul>\n\n\n\n<h2 class=\"wp-block-heading\">Practical Example<\/h2>\n\n\n\n<pre class=\"wp-block-preformatted\">from pyspark.sql import SparkSessionspark = SparkSession.builder.appName(\"Example\").getOrCreate()df = spark.read.csv(\"sales.csv\", header=True, inferSchema=True)# Transformations<br>df_filtered = df.filter(df.amount &gt; 1000)<br>df_grouped = df_filtered.groupBy(\"region\").sum(\"amount\")# Action<br>df_grouped.show()<\/pre>\n\n\n\n<p>Execution starts only when <code>show()<\/code> is called.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Real-World Flow<\/h2>\n\n\n\n<p>Load Data \u2192 Apply Transformations \u2192 Store Plan \u2192 Call Action \u2192 Spark Executes<\/p>\n\n\n\n<p>Example:<\/p>\n\n\n\n<p>Raw Sales Data \u2192 Filter High Sales \u2192 Group by Region \u2192 Show Results<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Interview Tip<\/h2>\n\n\n\n<p>Common interview question:<\/p>\n\n\n\n<p>&#8220;Why is Spark faster than traditional systems?&#8221;<\/p>\n\n\n\n<p>Answer:<br>Because Spark uses in-memory processing and lazy evaluation with optimized execution plans.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Final Takeaway<\/h2>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Transformations define what to do<\/li>\n\n\n\n<li>Actions define when to execute<\/li>\n\n\n\n<li>Spark executes only when required<\/li>\n<\/ul>\n\n\n\n<p>Understanding this concept is essential for building optimized Big Data pipelines with PySpark.<\/p>\n\n\n<div class=\"yoast-breadcrumbs\"><span><span><a href=\"https:\/\/gigz.pk\/python\/\">Home<\/a><\/span> \u00bb <span class=\"breadcrumb_last\" aria-current=\"page\">PYTHON FOR DATA ENGINEERING (PYDE) > Working with Big Data > Transformations and Actions<\/span><\/span><\/div>\n\n\n<div class=\"schema-faq wp-block-yoast-faq-block\"><div class=\"schema-faq-section\" id=\"faq-question-1774283380423\"><strong class=\"schema-faq-question\"><\/strong> <p class=\"schema-faq-answer\"><\/p> <\/div> <\/div>\n","protected":false},"menu_order":129,"template":"","class_list":["post-216","lesson","type-lesson","status-publish","hentry"],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v27.5 - https:\/\/yoast.com\/product\/yoast-seo-wordpress\/ -->\n<title>Transformations and Actions - One Language. Endless Possibilities<\/title>\n<meta name=\"description\" content=\"Learn PySpark transformations, actions, and lazy evaluation to optimize Big Data processing and build efficient Spark pipelines.\" \/>\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\n<link rel=\"canonical\" href=\"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/\" \/>\n<meta property=\"og:locale\" content=\"en_US\" \/>\n<meta property=\"og:type\" content=\"article\" \/>\n<meta property=\"og:title\" content=\"Transformations and Actions - One Language. Endless Possibilities\" \/>\n<meta property=\"og:description\" content=\"Learn PySpark transformations, actions, and lazy evaluation to optimize Big Data processing and build efficient Spark pipelines.\" \/>\n<meta property=\"og:url\" content=\"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/\" \/>\n<meta property=\"og:site_name\" content=\"One Language. Endless Possibilities\" \/>\n<meta property=\"article:modified_time\" content=\"2026-03-23T16:30:39+00:00\" \/>\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\n<meta name=\"twitter:label1\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data1\" content=\"2 minutes\" \/>\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\\\/\\\/schema.org\",\"@graph\":[{\"@type\":[\"WebPage\",\"FAQPage\"],\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/transformations-and-actions\\\/\",\"url\":\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/transformations-and-actions\\\/\",\"name\":\"Transformations and Actions - One Language. Endless Possibilities\",\"isPartOf\":{\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/#website\"},\"datePublished\":\"2026-03-03T08:58:51+00:00\",\"dateModified\":\"2026-03-23T16:30:39+00:00\",\"description\":\"Learn PySpark transformations, actions, and lazy evaluation to optimize Big Data processing and build efficient Spark pipelines.\",\"breadcrumb\":{\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/transformations-and-actions\\\/#breadcrumb\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/transformations-and-actions\\\/\"]}]},{\"@type\":\"BreadcrumbList\",\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/lesson\\\/transformations-and-actions\\\/#breadcrumb\",\"itemListElement\":[{\"@type\":\"ListItem\",\"position\":1,\"name\":\"Home\",\"item\":\"https:\\\/\\\/gigz.pk\\\/python\\\/\"},{\"@type\":\"ListItem\",\"position\":2,\"name\":\"PYTHON FOR DATA ENGINEERING (PYDE) > Working with Big Data > Transformations and Actions\"}]},{\"@type\":\"WebSite\",\"@id\":\"https:\\\/\\\/gigz.pk\\\/python\\\/#website\",\"url\":\"https:\\\/\\\/gigz.pk\\\/python\\\/\",\"name\":\"One Language. Endless Possibilities\",\"description\":\"\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\\\/\\\/gigz.pk\\\/python\\\/?s={search_term_string}\"},\"query-input\":{\"@type\":\"PropertyValueSpecification\",\"valueRequired\":true,\"valueName\":\"search_term_string\"}}],\"inLanguage\":\"en-US\"}]}<\/script>\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Transformations and Actions - One Language. Endless Possibilities","description":"Learn PySpark transformations, actions, and lazy evaluation to optimize Big Data processing and build efficient Spark pipelines.","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/","og_locale":"en_US","og_type":"article","og_title":"Transformations and Actions - One Language. Endless Possibilities","og_description":"Learn PySpark transformations, actions, and lazy evaluation to optimize Big Data processing and build efficient Spark pipelines.","og_url":"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/","og_site_name":"One Language. Endless Possibilities","article_modified_time":"2026-03-23T16:30:39+00:00","twitter_card":"summary_large_image","twitter_misc":{"Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":["WebPage","FAQPage"],"@id":"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/","url":"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/","name":"Transformations and Actions - One Language. Endless Possibilities","isPartOf":{"@id":"https:\/\/gigz.pk\/python\/#website"},"datePublished":"2026-03-03T08:58:51+00:00","dateModified":"2026-03-23T16:30:39+00:00","description":"Learn PySpark transformations, actions, and lazy evaluation to optimize Big Data processing and build efficient Spark pipelines.","breadcrumb":{"@id":"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/#breadcrumb"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/"]}]},{"@type":"BreadcrumbList","@id":"https:\/\/gigz.pk\/python\/lesson\/transformations-and-actions\/#breadcrumb","itemListElement":[{"@type":"ListItem","position":1,"name":"Home","item":"https:\/\/gigz.pk\/python\/"},{"@type":"ListItem","position":2,"name":"PYTHON FOR DATA ENGINEERING (PYDE) > Working with Big Data > Transformations and Actions"}]},{"@type":"WebSite","@id":"https:\/\/gigz.pk\/python\/#website","url":"https:\/\/gigz.pk\/python\/","name":"One Language. Endless Possibilities","description":"","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/gigz.pk\/python\/?s={search_term_string}"},"query-input":{"@type":"PropertyValueSpecification","valueRequired":true,"valueName":"search_term_string"}}],"inLanguage":"en-US"}]}},"_links":{"self":[{"href":"https:\/\/gigz.pk\/python\/wp-json\/wp\/v2\/lesson\/216","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/gigz.pk\/python\/wp-json\/wp\/v2\/lesson"}],"about":[{"href":"https:\/\/gigz.pk\/python\/wp-json\/wp\/v2\/types\/lesson"}],"wp:attachment":[{"href":"https:\/\/gigz.pk\/python\/wp-json\/wp\/v2\/media?parent=216"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}