Pandas is a powerful data manipulation library in Python that provides data structures and functions for effectively manipulating structured data. One of the key data structures in Pandas is the DataFrame, which can be thought of as a table or a spreadsheet. In this article, we will explore how to append data to a DataFrame using various methods and scenarios.
Appending data to a DataFrame is a common operation in data analysis and manipulation tasks. It involves adding new rows or columns to an existing DataFrame, thereby expanding the dataset. Pandas provides several ways to perform this operation, each suited to different scenarios and requirements.
1. Appending Rows to a DataFrame
One of the most common operations is appending rows to a DataFrame. This can be done using the append()
method, which allows you to add one or more rows to the DataFrame.
Example 1: Appending a Single Row Using a Dictionary
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
# Create a dictionary representing a new row
new_row = {'Name': 'Charlie', 'Website': 'pandasdataframe.com', 'Age': 35}
# Append the row to the DataFrame
df = df._append(new_row, ignore_index=True)
print(df)
Output:
Example 2: Appending Multiple Rows Using a List of Dictionaries
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
# Create a list of dictionaries representing new rows
new_rows = [
{'Name': 'Charlie', 'Website': 'pandasdataframe.com', 'Age': 35},
{'Name': 'David', 'Website': 'pandasdataframe.com', 'Age': 40}
]
# Append the rows to the DataFrame
df = df._append(new_rows, ignore_index=True)
print(df)
Output:
2. Appending DataFrames
Another common scenario is appending one DataFrame to another. This is useful when you have data split across multiple DataFrames and you want to combine them into a single DataFrame.
Example 3: Appending Two DataFrames
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Charlie', 'David'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [35, 40]
})
# Append df2 to df1
df = df1._append(df2, ignore_index=True)
print(df)
Output:
Example 4: Appending Multiple DataFrames Using Concat
import pandas as pd
# Create three DataFrames
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Charlie', 'David'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [35, 40]
})
df3 = pd.DataFrame({
'Name': ['Eve', 'Frank'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [45, 50]
})
# Use concat to append all DataFrames
df = pd.concat([df1, df2, df3], ignore_index=True)
print(df)
Output:
3. Appending Columns to a DataFrame
In addition to appending rows, you might also need to append columns to a DataFrame. This can be done by simply assigning new columns to the DataFrame.
Example 5: Appending a Single Column
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Age': [25, 30]
})
# Append a new column
df['Website'] = ['pandasdataframe.com', 'pandasdataframe.com']
print(df)
Output:
Example 6: Appending Multiple Columns Using a DataFrame
import pandas as pd
# Create a DataFrame
df = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Age': [25, 30]
})
# Create another DataFrame with new columns
new_columns = pd.DataFrame({
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Score': [88, 92]
})
# Append new columns to the original DataFrame
df = pd.concat([df, new_columns], axis=1)
print(df)
Output:
4. Handling Indexes When Appending
When appending data, it’s important to manage the indexes properly to avoid issues with duplicate indexes. Pandas provides several options to handle indexes during the append operation.
Example 7: Resetting Index After Append
import pandas as pd
# Create two DataFrames
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Charlie', 'David'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [35, 40]
})
# Append df2 to df1 and reset the index
df = df1._append(df2, ignore_index=True)
print(df)
Output:
Example 8: Using Concat with Sort
import pandas as pd
# Create three DataFrames
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Charlie', 'David'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [35, 40]
})
df3 = pd.DataFrame({
'Name': ['Eve', 'Frank'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [45, 50]
})
# Use concat to append all DataFrames and sort the index
df = pd.concat([df1, df2, df3], ignore_index=True, sort=True)
print(df)
Output:
5. Appending with Different Column Names
Sometimes, the DataFrames you want to append might not have the same column names. In such cases, you can use the rename
method to align the column names before appending.
Example 9: Aligning Column Names Using Rename
import pandas as pd
# Create two DataFrames with different column names
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'PersonName': ['Charlie', 'David'],
'Site': ['pandasdataframe.com', 'pandasdataframe.com'],
'Years': [35, 40]
})
# Rename columns in df2 to match df1
df2.rename(columns={'PersonName': 'Name', 'Site': 'Website', 'Years': 'Age'}, inplace=True)
# Append df2 to df1
df = df1._append(df2, ignore_index=True)
print(df)
Output:
6. Appending with Missing Columns
When appending DataFrames, you might encounter situations where one DataFrame has columns that the other does not. Pandas handles this gracefully by filling in missing columns with NaN values.
Example 10: Appending DataFrames with Missing Columns
import pandas as pd
# Create two DataFrames with different columns
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Charlie', 'David'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Score': [88, 92]
})
# Append df2 to df1
df = df1._append(df2, ignore_index=True)
print(df)
Output:
In the resulting DataFrame, the ‘Age’ column for the rows from df2 will be filled with NaN, and the ‘Score’ column for the rows from df1 will be filled with NaN.
7. Appending with Different Data Types
Pandas also handles appending of columns with different data types. If a column in one DataFrame is of a different data type than the corresponding column in the other DataFrame, Pandas will try to convert the data type to a common type that can accommodate all values.
Example 11: Appending DataFrames with Different Data Types
import pandas as pd
# Create two DataFrames with different data types
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Charlie', 'David'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': ['35', '40']
})
# Append df2 to df1
df = df1._append(df2, ignore_index=True)
print(df)
Output:
In the resulting DataFrame, the ‘Age’ column will be of object data type, as it can accommodate both integer and string values.
8. Appending with Duplicate Rows
When appending DataFrames, you might encounter situations where there are duplicate rows. By default, the append
method does not remove duplicate rows. However, you can use the drop_duplicates
method to remove duplicates after appending.
Example 12: Removing Duplicate Rows After Append
import pandas as pd
# Create two DataFrames with a duplicate row
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Name': ['Bob', 'Charlie'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [30, 35]
})
# Append df2 to df1 and remove duplicates
df = df1._append(df2, ignore_index=True).drop_duplicates()
print(df)
Output:
9. Appending with Different Indexes
If the DataFrames you are appending have different indexes, the append
method will keep the original indexes by default. However, you can use the ignore_index
parameter to reset the index.
Example 13: Appending DataFrames with Different Indexes
import pandas as pd
# Create two DataFrames with different indexes
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
}, index=[1, 2])
df2 = pd.DataFrame({
'Name': ['Charlie', 'David'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [35, 40]
}, index=[3, 4])
# Append df2 to df1 and reset the index
df = df1._append(df2, ignore_index=True)
print(df)
Output:
10. Appending with Sort
By default, the append
method does not sort the columns. If you want to sort the columns, you can use the sort
parameter.
Example 14: Appending DataFrames with Sort
import pandas as pd
# Create two DataFrames with different column orders
df1 = pd.DataFrame({
'Name': ['Alice', 'Bob'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com'],
'Age': [25, 30]
})
df2 = pd.DataFrame({
'Age': [35, 40],
'Name': ['Charlie', 'David'],
'Website': ['pandasdataframe.com', 'pandasdataframe.com']
})
# Append df2 to df1 and sort the columns
df = df1._append(df2, ignore_index=True, sort=True)
print(df)
Output:
In conclusion, the append
method in Pandas is a versatile tool for adding rows or columns to a DataFrame. It provides a range of options to handle different scenarios and requirements, making it a powerful tool for data manipulation and analysis.