Cloudbooklet
  • News
  • Artificial Intelligence
  • Applications
  • Linux
No Result
View All Result
Cloudbooklet
  • News
  • Artificial Intelligence
  • Applications
  • Linux
No Result
View All Result
Cloudbooklet
No Result
View All Result
Home Linux

How to Use Python Pandas to Fill in Missing Data

by Hollie Moore
3 months ago
in Linux
Python Pandas
ShareTweetSendShare
Readers like you help support Cloudbooklet. When you make a purchase using links on our site, we may earn an affiliate commission.

Learn how to fill missing data in Python Pandas using various techniques. Discover methods like fillna(), interpolate(), and more to handle gaps in your data effectively. Master the art of data manipulation and ensure complete datasets with Python Pandas.

ADVERTISEMENT

Data missing is a common issue in data analysis. It can happen due to a variety of factors, including incomplete surveys, corrupted files, or human mistake. Missing data can make statistical analysis and machine learning jobs challenging. Pandas is a Python library for data analysis that provides a number of tools for dealing with missing data. This post will go over how to fill missing data in Pandas and how to install Pandas step-by-step procedures.

Table of Contents

  1. How to Install Pandas
  2. Useful functions in Pandas
  3. Missing values using isnull() and notnull()
  4. dropna() Method
  5. fillna() Method to Fill Missing Values
  6. replace() Missing Data
  7. Fill Missing Data With interpolate()
  8. Conclusion

Missing data can be represented by two values:

None: None is a Python singleton object that is widely used in Python programs to signify missing data. It is a general placeholder that can be provided to variables or elements in a Pandas DataFrame or Series to indicate the lack of a value.

ADVERTISEMENT

NaN: “Not a Number” is a special floating-point value recognized by systems that employ the standard IEEE floating-point encoding. It is a specialized representation of numerical data that is absent or undefined. NaN is frequently used in Pandas to indicate missing or undefined values in numeric columns of DataFrames or Series.

You might also like

Symbolic Links In Linux

Symbolic Links in Linux: What They Are and How to Use Them

2 months ago
Ubuntu Password

Reset your Forgotten Ubuntu Password in 2 Minutes

2 months ago

How to Install Pandas

Before we begin, please ensure that pandas are installed in your Python virtual environment. You may easily install it using your terminal and the pip package manager. Launch your terminal and type the following command:

pip install pandas

This command will download and install the pandas library, enabling you to take advantage of its extensive data analysis features. Once the installation is complete, you will be able to use pandas in Python to fill in missing data.

ADVERTISEMENT

Useful functions in Pandas

Pandas considers None and NaN to be interchangeable for signalling missing or null values. Pandas has several useful functions for identifying, deleting, and replacing null values in a DataFrame to help with this convention:

  • isnull():
  • notnull():
  • dropna():
  • fillna():
  • replace(): 
  • interpolate():

Missing values using isnull() and notnull()

The isnull() and notnull() functions in Pandas DataFrame can be used to check for missing values. These functions are not just applicable to DataFrames, but may also be used with Pandas Series to identify null values. Here’s how you can make use of them:

ADVERTISEMENT

Checking for isnull() function

We utilize the isnull() function in Pandas DataFrame to check for null values. This function returns a dataframe of Boolean values that are True for None values.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, None], 'B': [None, 5, 6]})

# Check for missing values using isnull()
missing_values = df.isnull()
print(missing_values)

Output:

ADVERTISEMENT
AB
0FalseTrue
1FalseFalse
2TrueFalse

Checking for notnull() function

The notnull() function returns a DataFrame of the same shape as the original DataFrame, with each element being True if it has a non-null value and False otherwise.

import pandas as pd

# Create a DataFrame
df = pd.DataFrame({'A': [1, 2, None], 'B': [None, 5, 6]})

# Check for missing values using notnull()
missing_values = df.notnull()
print(missing_values)

Output:

ADVERTISEMENT
AB
0TrueFalse
1TrueTrue
2FalseTrue

Missing values in a Series

  • When applied directly to a Series, the isnull() function returns a boolean Series showing which values are null.
  • These functions are useful for discovering missing values in DataFrames and Series, allowing you to better examine and handle your data’s null values.
# Create a Series
s = pd.Series([1, None, 3, 4, None])

# Check for missing values using isnull()
missing_values = s.isnull()
print(missing_values)

Output:

A
0False
1True
2False
3False
4False
dtype: bool

dropna() Method

When using the dropna() function to remove null values from a DataFrame, you may choose whether to remove rows or columns. Dropna() by default removes all rows with null values. You can, however, remove columns with null values by supplying the axis parameter.

The updated code for removing null values from a DataFrame is as follows:

Example1: drop rows with at least one null value using the dropna() function:

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
df

Output

Missing Data
How to Use Python Pandas to Fill in Missing Data 1

Example2: drop rows with at least one Nan value (Null value) 

# importing pandas as pd
import pandas as pd
  
# importing numpy as np
import numpy as np
  
# dictionary of lists
dict = {'First Score':[100, 90, np.nan, 95],
        'Second Score': [30, np.nan, 45, 56],
        'Third Score':[52, 40, 80, 98],
        'Fourth Score':[np.nan, np.nan, np.nan, 65]}
  
# creating a dataframe from dictionary
df = pd.DataFrame(dict)
  
# using dropna() function  
df.dropna()

Output:

Missing Data
How to Use Python Pandas to Fill in Missing Data 2

fillna() Method to Fill Missing Values

The pandas fillna() method fills empty rows in your dataset with a provided value. It offers freedom and optional arguments for customizing the filling procedure. Let’s look at the approaches for filling missing data with the fillna() method and the options available:

  • Value
  • Method
  • Inplace

Sample DataFrame

# Load a Sample Dataset
import pandas as pd
df = pd.DataFrame({
    "Name": ['Alice', 'Bob', None, 'David', None, 'Fiona', 'George'],
    "Age": [25, None, 23, 35, None, 31, 28],
    "Gender": ['F', 'M', 'M', None, 'F', 'F', 'M'],
    "Years": [3, None, None, None, 7, None, 2]
})
print(df.head())

Output:

Name AgeGenderYears
0Alice 25.0F3.0
1Bob NaNMNaN
2Charlie 23.0MNaN
3David 35.0NoneNaN
4EvaNaNF7.0

Fill Missing Data Value

  • The value parameter indicates the value to be inserted into the missing rows. It can be a constant value, a computed value, or any other value you specify. fillna(0), for example, can be used to replace missing data with 0.
  • Code for utilizing the fillna() method to replace all missing values in a Pandas column, notably the “Years” column, with 0:
# Load a Sample Dataset
import pandas as pd
df = pd.DataFrame({
    "Name": ['Alice', 'Bob', None, 'David', None, 'Fiona', 'George'],
    "Age": [25, None, 23, 35, None, 31, 28],
    "Gender": ['F', 'M', 'M', None, 'F', 'F', 'M'],
    "Years": [3, None, None, None, 7, None, 2]
})

df['Years'] = df['Years'].fillna(0)
print(df.head())

Output:

Name AgeGenderYears
0Alice 25.0F3.0
1Bob NaNM0.0
2Charlie 23.0M0.0
3David 35.0None0.0
4EvaNaNF7.0

Fill Missing Data Method

Using the method parameter, you can fill in missing values in a given direction. Method='ffill' (forward fill) replaces missing values with the previous non-missing value, whereas method='bfill' (backward fill) replaces missing values with the next non-missing value.

Code to show forward and backfilling in pandas using the.fillna() method:

# Load a Sample Dataset
import pandas as pd
df = pd.DataFrame({
    "Name": ['Alice', 'Bob', None, 'David', None, 'Fiona', 'George'],
    "Age": [25, None, 23, 35, None, 31, 28],
    "Gender": ['F', 'M', 'M', None, 'F', 'F', 'M'],
    "Years": [3, None, None, None, 7, None, 2]
})
# Forward fill missing data using .fillna()
df['Years'] = df['Years'].fillna(method='ffill')
print(df)

Output:

Name AgeGenderYears
0Alice 25.0F3.0
1Bob NaNM3.0
2None 23.0M3.0
3David 35.0None3.0
4None NaNF7.0
5Fiona 31.0F7.0
6George 28.0M2.0
  • Utilizing code to show forward and backfillingA DataFrame df is generated in this code, with a “Years” column having missing values represented as None. To do forward-filling, the.fillna() function is applied to the “Years” column with the option method=’ffill’.
  • The output shows that the missing values in the “Years” column are filled with the value that came before the gap. This method is especially effective with time series data, because filling missing values with the most recently observed value can maintain the time series’ continuity.

Fill Missing Data InPlace

  • The inplace parameter is a conditional statement that specifies whether the alteration is applied to the DataFrame permanently.
  • It is set to False by default, which means that the original DataFrame remains untouched. Setting inplace=True permanently alters the DataFrame.
# Load a Sample Dataset
import pandas as pd
df = pd.DataFrame({
    "Name": ['Alice', 'Bob', None, 'David', None, 'Fiona', 'George'],
    "Age": [25, None, 23, 35, None, 31, 28],
    "Gender": ['F', 'M', 'M', None, 'F', 'F', 'M'],
    "Years": [3, None, None, None, 7, None, 2]
})
# Fill Missing Values In Place
df['Name'].fillna('Missing', inplace=True)
print(df.head())

Output

Name AgeGenderYears
0Alice 25.0F3.0
1Bob NaNM0.0
2Missing23.0M0.0
3David 35.0None0.0
4MissingNaNF7.0
Output

By combining these optional arguments with the fillna() method, you can easily personalize the process of filling missing data to suit individual needs. Let’s now look at how to use the fillna() method to fill in missing data.

replace() Missing Data

The pandas replace() method is a powerful tool for replacing values within a DataFrame that is not limited to empty cells or NaN values. It enables you to replace any defined value with a value of your choosing.

Replace(), like fillna(), can be used to replace NaN values in a given column with the mean, median, mode, or any other desired value. The method additionally accepts the inplace keyword parameter, which allows you to directly edit the DataFrame.

Let’s explore how the replace() method works by replacing null rows with their mean, median, or mode in named columns:

import pandas as pd
  
df = {
  "Array_1": [49.50, 70],
  "Array_2": [65.1, 49.50]
}
 
data = pd.DataFrame(df)
 
print(data.replace(49.50, 50))

Output

Array_1Array_2
050.065.1,
17050.0
  • You may essentially replace the null rows in the corresponding columns with the computed values by running these lines of code.
  • The replace() method allows you to replace individual values within your DataFrame, allowing you to handle missing data in a flexible manner.

Fill Missing Data With interpolate()

The pandas interpolate() function is a powerful way for predicting missing values in a DataFrame based on existing values. This method can provide reasonable estimates for missing rows by employing mathematical interpolation techniques.

It’s worth noting that the interpolate() technique only works with numeric columns because it relies on mathematical calculations to fill in the missing values. Furthermore, putting the inplace keyword parameter to True permanently alters the DataFrame.

Run the following code to observe how the interpolate() method works:

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
                   "B":[None, 2, 54, 3, None],
                   "C":[20, 16, None, 3, 8],
                   "D":[14, 3, None, None, 6]})  
# Print the dataframe
df

Output

Missing Data
How to Use Python Pandas to Fill in Missing Data 3

The linear approach is used to interpolate missing data, which treats the values as equally spaced and ignores the index:

# importing pandas as pd
import pandas as pd
  
# Creating the dataframe 
df = pd.DataFrame({"A":[12, 4, 5, None, 1],
                   "B":[None, 2, 54, 3, None],
                   "C":[20, 16, None, 3, 8],
                   "D":[14, 3, None, None, 6]})

# to interpolate the missing values
df.interpolate(method ='linear', limit_direction ='forward')
# Print the dataframe
df

Output

Missing Data
How to Use Python Pandas to Fill in Missing Data 4
  • The interpolate() method is applied to the DataFrame in the preceding code, automatically considering just the numeric columns. The technique parameter is set to ‘linear,’ suggesting that linear interpolation will be used to estimate missing values.
  • The limit_direction argument defines whether interpolation should be performed in the backward (‘backward’) or forward (‘forward’) direction.
  • By running these lines of code, you may conduct interpolation on the DataFrame’s numeric columns, filling in missing values with estimated values based on the existing data.
  • By applying mathematical estimating approaches, the interpolate() method provides a strong tool for dealing with missing data, particularly in numeric columns.

Also read: You might also find useful our guide on Exploring the Power of PandasAI

Conclusion


In conclusion, Python Pandas offers sophisticated tools for filling in missing data. You may successfully manage missing values in your data by using functions like fillna() and interpolate() and specifying methods like forward-filling, back-filling, or linear interpolation. These strategies assure data integrity and keep your dataset’s structure and integrity. You can securely handle missing data with Pandas, delivering accurate and dependable analyses and insights. Please share your thoughts and feedback in the comment section below.

Tags: Python
ShareTweetSendShare
Hollie Moore

Hollie Moore

Greetings, I am a technical writer who specializes in conveying complex topics in simple and engaging ways. I have a degree in computer science and journalism, and I have experience writing about software, data, and design. My content includes blog posts, tutorials, and documentation pages, which I always strive to make clear, concise, and useful for the reader. I am constantly learning new things and sharing my insights with others.

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Related Posts

&Quot; Systemd Service On Linux

How to Create a New systemd Service on Linux: A Step-by-Step Guide

3 months ago
List Groups In Linux

How to List Groups in Linux: A Guide for Beginners

3 months ago
Hostname In Linux

How to Modify the Hostname in Linux

3 months ago
Linux Systems

Linux systems Hacked with OpenSSH Malware

3 months ago

Follow Us

Trending Articles

Ai Trainer

How to Become an AI Trainer: Skills, Salary, and Career Opportunities

September 15, 2023

Microsoft Editor vs Grammarly: Which is the Best Grammar Checker?

5 Best Fantasy Minecraft Servers in 2023

Best 10 AI Comic Generator: Create Comic book in Seconds

How to Use ChatGPT to Translate Your Website or Blog

Microsoft Unveils New Disc-Less Xbox Series X with Lift-to-Wake Controller

Popular Articles

Ai Photo Editor

7 Best AI Photo Editor You Need to Try Online Free

September 7, 2023

How to Fix TikTok Comment Glitch in 5 Easy Steps

Top 5 AI Portrait Generators for Free and Paid Options

7 Best AI Girl Generators for Creating Realistic and Beautiful AI Girls

Best 10 Instagram Video Downloader Apps and Websites for 2023

How to Create and Customize Stunning Contact Poster on iPhone

Subscribe Now

loader

Subscribe to our mailing list to receives daily updates!

Email Address*

Name

Cloudbooklet Logo

Welcome to our technology blog, where we explore the latest advancements in the field of artificial intelligence (AI) and how they are revolutionizing cloud computing. In this blog, we dive into the powerful capabilities of cloud platforms like Google Cloud Platform (GCP), Amazon Web Services (AWS), and Microsoft Azure, and how they are accelerating the adoption and deployment of AI solutions across various industries. Join us on this exciting journey as we explore the endless possibilities of AI and cloud computing.

  • About
  • Contact
  • Disclaimer
  • Privacy Policy

Cloudbooklet © 2023 All rights reserved.

No Result
View All Result
  • News
  • Artificial Intelligence
  • Applications
  • Linux

Cloudbooklet © 2023 All rights reserved.