Updated April 6, 2023
Introduction to Pandas DataFrame.query()
Searching one specific item in a group of data is a very common capability that is expected among all software enlistments. From the python perspective in the pandas world this capability is achieved in several ways and query() method is one among them. The query() method is an effective technique to query the necessary columns and rows from a dataframe based on some specific conditions. So it provides a flexible way to query the columns associated to a dataframe with a boolean expression.
Syntax:
DataFrame.query(expr, inplace=False, **kwargs)
Parameter & Description of Pandas DataFrame.query()
Below are the parameters of Pandas DataFrame.query():
Parameter | Description |
expr | The expression parameter is where the expression to be queried is placed. According to the expression formulated here the outcome is derived. |
inplace | This is used to determine whether the operation needs to be performed at the place of the data. So this means whether the outcome of the query() method needs to be held on to the current dataframe for which it is applied. this is again a boolean variable, if this is set to true then the query() changes will be applied to the current dataframe itself, if this argument is assigned as false then no changes will be applied to the current dataframe a equals relation can be used to pull the updated dataframe into a different dataframe. |
Examples to Implement of Pandas DataFrame.query()
Below are the examples of Pandas DataFrame.query():
Example #1
Code:
import pandas as pd
Core_Dataframe = pd.DataFrame({'Emp_No': ['Emp1','Emp2','Emp3','Emp4'],
'Employee_Name': ['Arun', 'selva', 'rakesh', 'arjith'],
'Employee_dept': ['CAD', 'CAD', 'DEV', 'CAD']})
print(" THE CORE DATAFRAME ")
print(Core_Dataframe)
print("")
Queried_Dataframe = Core_Dataframe.query('Employee_dept == "CAD"')
print(" THE QUERIED DATAFRAME ")
print(Queried_Dataframe)
print("")
Output:
Explanation: In this example, the core dataframe is first formulated. pd.dataframe() is used for formulating the dataframe. Every row of the dataframe is inserted along with their column names. Once the dataframe is completely formulated it is printed on to the console. We can notice at this instance the dataframe holds details like employee number, employee name, and employee department. from the formulated dataframe all employees who fall under the employee department ‘CAD’ are alone queried and derived into a new dataframe. The queried new dataframe is printed on to the console.
Example #2
Code:
import pandas as pd
Core_Dataframe = pd.DataFrame({'A': [ 11, 6, 11, 15, 31, 26],
'B': [2, 7, 12, 17, 22, 27],
'C': [3, 8, 13, 18, 23, 28],
'D': [4, 9, 14, 19, 24, 29],
'E': [5, 10, 15, 20, 25, 30]})
print(" THE CORE DATAFRAME ")
print(Core_Dataframe)
print("")
Queried_Dataframe = Core_Dataframe.query('A > B')
print(" THE QUERIED DATAFRAME ")
print(Queried_Dataframe)
print("")
Output:
Explanation: In this example the core dataframe is first formulated. pd.dataframe() is used for formulating the dataframe. Every row of the dataframe is inserted along with their column names. Once the dataframe is completely formulated it is printed on to the console. We can notice at this instance the dataframe holds a random set of numbers and alphabetic values of columns associated with it. Here the core dataframe is queried to pull all the rows where the value in column ‘A’ is greater than the value in column ‘B’. We notice 2 of the rows from the core dataframe satisfy this condition and are printed onto the console.
Example #3
Code:
import pandas as pd
Core_Dataframe = pd.DataFrame( {
'name': ['Alan Xavier', 'Annabella', 'Janawong', 'Yistien', 'Robin sheperd', 'Amalapaul', 'Nori'],
'city': ['california', 'Toronto', 'ontario', 'Shanghai',
'Manchester', 'Cairo', 'Osaka'],
'age': [51, 38, 23, 64, 18, 57, 47],
'py_score': [82.0, 73.0, 81.0, 30.0, 48.0, 92.0, 84.0] })
print(" THE CORE DATAFRAME ")
print(Core_Dataframe)
print("")
Queried_Dataframe = Core_Dataframe.query('age > 50 and py_score> 80')
print(" THE QUERIED DATAFRAME ")
print(Queried_Dataframe)
print("")
Output:
Explanation: In this example, the core dataframe is first formulated. pd.dataframe() is used for formulating the dataframe. Every row of the dataframe is inserted along with their column names. Once the dataframe is completely formulated it is printed on to the console. We can notice at this instance the dataframe holds random people information and the py_score value of those people. the key columns used in this dataframe are name, age, city, and py-score value. Here all values were the age of the person is greater than 50 and the pyscore is greater than 80 is queried and formulated as a separate dataframe. the formulated dataframe is printed onto the console.
Example #4
Code:
import pandas as pd
Core_Dataframe = pd.DataFrame({'A': [ 11.23, 6.66, 11.55, 15.44, 21.44, 26.4 ],
'B': [ 2.345, 745.5, 12.4, 17.34, 22.35, 27.44 ],
'C': [ 32.67, 8, 13.4, 18, 23, 28.44 ],
'D': [ 4.6788, 923.3, 14.5, 19, 24, 29.44 ],
'E': [ 5.3, 10.344, 155.556, 20.6775, 25.4455, 30.3 ]})
print(" THE CORE DATAFRAME ")
print(Core_Dataframe)
print("")
Core_Dataframe.query(' A > B and B < D', inplace=True)
print(" THE CORE DATAFRAME AFTER QUERYING ")
print(Core_Dataframe)
print("")
Output:
Explanation: In this example the core dataframe is first formulated. pd.dataframe() is used for formulating the dataframe. Every row of the dataframe is inserted along with their column names. Once the dataframe is completely formulated it is printed on to the console. A typical float dataset is used in this instance. Here the core dataframe is queried to pull all the rows where the value in column ‘A’ is greater than the value in column ‘B’ and one more additional condition is added as per which column ‘B’ is greater than column ‘D’ is queried. We notice only of the rows from the core dataframe satisfy this condition and it is printed onto the console. Here since the inplace option is set to true all changes after the querying process is maintained to the core dataframe itself.
Conclusion
There are several ways through which pandas allows to filter data from a dataframe in a conditional manner. Among the available techniques like where(), loc. etc the query() method is definitely an effective and easy way for filtering the dataframes.
Recommended Articles
We hope that this EDUCBA information on “Pandas DataFrame.query()” was beneficial to you. You can view EDUCBA’s recommended articles for more information.