Updated April 4, 2023
Introduction to PySpark withColumnRenamed
PySpark With Column Renamed is a PySpark function that is used to rename columns in a PySpark data model. The with column Renamed function is used to rename an existing column returning a new data frame in the PySpark data model. This with column renamed function can be used to rename a single column as well as multiple columns in the PySpark data frame. The with column renamed function accepts two functions one being the existing column name as well the other as the new column name.
Query plans are generated for the generation of new columns and the rename function is then executed. This doesn’t modify the existing data frame and creates a new one. In this article we will try to analyze the various ways of using the PYSPARK With Column RENAMED operation PySpark.
Let us try to see about PYSPARK PYSPARK With Column RENAMED in some more details.
Syntax for PYSPARK with Column Renamed
The syntax for PYSPARK With Column RENAMED function is:-
data1 = [{'Name':'Jhon','ID':21.528,'Add':'USA'},{'Name':'Joe','ID':3.69,'Add':'USA'},{'Name':'Tina','ID':2.48,'Add':'IND'},{'Name':'Jhon','ID':22.22, 'Add':'USA'},{'Name':'Joe','ID':5.33,'Add':'INA'}]
a = sc.parallelize(data1)
b = spark.createDataFrame(a)
c = b.withColumnRenamed("ID","New_ID")
c.show()
The data frame model to be used.
B:- The data frame created.
C:- The new Data Frame to be used with the with ColumnRenamed Function. This takes up the old existing column as well as the new column name.
Screenshot:-
Working with Column Renamed in PYSPARK
Let us see how PYSPARK With Column RENAMED works in PySpark:-
The with Column function is used to rename one or more columns in the PySpark data frame. This covers the data frame into a new data frame that has the new column name embedded with it. The with column function adds up a new column with a new name or replaces the column element with the same name. This takes up a two-parameter which consists of the existing column name as well as the new column name that needs up to be renamed.
The With column Renamed function returns a new data frame by performing the rename operation over the column function, since the data frame is immutable in nature so the return type is a new data frame with columns renamed. It introduces an internal projection that makes the query plan for spark application and columns are renamed in a PySpark job.
Let’s check the creation and working of PYSPARK With Column RENAMED with some coding examples.
Examples of PySpark withColumnRenamed
Let us see some Example how PYSPARK With Column RENAMED operation works:-
Let’s start by creating a sample data frame in PySpark.
data1 = [{'Name':'Jhon','ID':21.528,'Add':'USA'},{'Name':'Joe','ID':3.69,'Add':'USA'},{'Name':'Tina','ID':2.48,'Add':'IND'},{'Name':'Jhon','ID':22.22, 'Add':'USA'},{'Name':'Joe','ID':5.33,'Add':'INA'}]
This takes up the data as list which we can use for creation of the Data Frame.
a = sc.parallelize(data1)
b = spark.createDataFrame(a)
The parallelize and create data Frame function in PySpark is used to create a data frame in Spark.
b.show()
Screenshot:-
This creates a data frame with sample column names as Add, ID, and Name. Now we will try to rename the column name using the column renamed function.
b.withColumnRenamed("ID","New_ID")
The function takes up two-parameter the one as the existing column name and the other as the new column name.
The new data frame will create a Column name as the New Column Name. Let’s analyze the output of this.
c.show()
Screenshot:-
The function can be applied to multiple column rename also, this with Column renamed function can be used to rename multiple columns also over the PySpark data frame function.
The multiple column rename function can be applied over single data frame the result being a single new data frame with renamed updated columns. Let’s check that with an simple example :-
c = b.withColumnRenamed("ID","New_ID").withColumnRenamed("Add","New_Add").withColumnRenamed("Name","New_Name")
c.show()
This sample code will be used to rename all the columns data of a PySpark data frame.
Output:-
Screenshot:-
These are some of the Examples of PYSPARK With Columns RENAMED in PySpark.
Note:-
- PYSPARK With Column RENAMED is used to rename the column in PySpark.
- PYSPARK With Column RENAMED creates a new data frame from the existing data frame renaming the column value.
- PYSPARK With Column RENAMED can be used to rename multiple columns also with Spark Data Frame.
- PYSPARK With Column RENAMED takes two input parameters the existing one and the new column name.
Conclusion
From the above article, we saw the working of FLATMAP in PySpark. From various example and classification, we tried to understand how this FLATMAP FUNCTION ARE USED in PySpark and what are is used in the programming level. The various methods used showed how it eases the pattern for data analysis and a cost-efficient model for the same.
We also saw the internal working and the advantages of FLATMAP in PySpark Data Frame and its usage for various programming purpose. Also, the syntax and examples helped us to understand much precisely the function.
Recommended Articles
This is a guide to PySpark withColumnRenamed. Here we discuss the various ways of using the PYSPARK With Column RENAMED operation PySpark. You may also have a look at the following articles to learn more –