You are currently viewing Getting Started with Python and Pandas

Getting Started with Python and Pandas

  • Post author:
  • Post category:Python
  • Post comments:0 Comments
  • Post last modified:February 23, 2024

Introduction

Pandas is a powerful open-source data manipulation and analysis library for Python. It provides easy-to-use data structures and data analysis tools, making it an essential tool for data scientists and analysts. In this tutorial, we will cover the basics of Pandas, including how to install it, create data frames, perform common operations, and more.

Prerequisites

Before we get started, make sure you have Python installed on your system. You can download Python from python.org if you haven’t already. Additionally, you’ll need to install Pandas, which can be done using pip, Python’s package installer.

You can install Pandas by running the following command in your terminal or command prompt:

pip install pandas

Once Pandas is installed, you’re ready to start using it!

Importing Pandas

In order to use Pandas in your Python script or notebook, you need to import it. It is common to import Pandas with the alias pd:

import pandas as pd

Creating a DataFrame

The primary data structure in Pandas is the DataFrame, which is a two-dimensional table with rows and columns. You can create a DataFrame from various data types such as lists, dictionaries, or even from external files like CSVs or Excel sheets.

Creating from Lists

You can create a DataFrame from a list of lists. Each inner list represents a row in the DataFrame.

import pandas as pd

data = [
    ['Alice', 25, 'Female'],
    ['Bob', 30, 'Male'],
    ['Charlie', 35, 'Male']
]

# Define column names
columns = ['Name', 'Age', 'Gender']

# Create the DataFrame
df = pd.DataFrame(data, columns=columns)

print(df)

Output:

      Name  Age  Gender
0    Alice   25  Female
1      Bob   30    Male
2  Charlie   35    Male

Creating from a Dictionary

You can also create a DataFrame from a dictionary where keys are column names and values are lists representing column data.

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male', 'Male']
}

# Create the DataFrame
df = pd.DataFrame(data)

print(df)

Output:

      Name  Age  Gender
0    Alice   25  Female
1      Bob   30    Male
2  Charlie   35    Male

Reading from a CSV File

Pandas makes it easy to read data from external files. For example, to read from a CSV file, you can use pd.read_csv():

import pandas as pd

# Read CSV file into a DataFrame
df = pd.read_csv('data.csv')

print(df)

Basic DataFrame Operations

Once you have a DataFrame, you can perform various operations on it.

Viewing Data

To view the first few rows of a DataFrame, you can use head():

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male', 'Male']
}

df = pd.DataFrame(data)

# Display the first 2 rows
print(df.head(2))

Output:

    Name  Age  Gender
0  Alice   25  Female
1    Bob   30    Male

Accessing Columns

You can access columns of a DataFrame using their names:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male', 'Male']
}

df = pd.DataFrame(data)

# Access the 'Name' column
print(df['Name'])

Output:

0      Alice
1        Bob
2    Charlie
Name: Name, dtype: object

Filtering Data

You can filter rows based on certain conditions:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male', 'Male']
}

df = pd.DataFrame(data)

# Filter rows where Age is greater than 25
filtered_df = df[df['Age'] > 25]

print(filtered_df)

Output:

      Name  Age Gender
1      Bob   30   Male
2  Charlie   35   Male

Adding a New Column

You can add a new column to the DataFrame:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male', 'Male']
}

df = pd.DataFrame(data)

# Add a new column 'City'
df['City'] = ['New York', 'Los Angeles', 'Chicago']

print(df)

Output:

      Name  Age  Gender         City
0    Alice   25  Female     New York
1      Bob   30    Male  Los Angeles
2  Charlie   35    Male      Chicago

Deleting a Column

You can delete a column using the drop() method:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male', 'Male']
}

df = pd.DataFrame(data)

# Drop the 'Gender' column
df = df.drop('Gender', axis=1)

print(df)

Output:

      Name  Age
0    Alice   25
1      Bob   30
2  Charlie   35

Summary Statistics

Pandas provides easy ways to calculate summary statistics on your data.

Descriptive Statistics

You can use the describe() method to get a summary of descriptive statistics:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie'],
    'Age': [25, 30, 35],
    'Gender': ['Female', 'Male', 'Male']
}

df = pd.DataFrame(data)

# Get descriptive statistics
print(df.describe())

Output:

             Age
count   3.000000
mean   30.000000
std     5.773503
min    25.000000
25%    27.500000
50%    30.000000
75%    32.500000
max    35.000000

GroupBy Operations

You can use groupby() to group data and then perform operations on the groups:

import pandas as pd

data = {
    'Name': ['Alice', 'Bob', 'Charlie', 'David',

 'Eve'],
    'Department': ['HR', 'Engineering', 'HR', 'Engineering', 'Engineering'],
    'Salary': [60000, 80000, 70000, 75000, 90000]
}

df = pd.DataFrame(data)

# Group by 'Department' and calculate average salary
avg_salary = df.groupby('Department')['Salary'].mean()

print(avg_salary)

Output:

Department
Engineering    81666.666667
HR             65000.000000
Name: Salary, dtype: float64

Conclusion

Pandas is a powerful library that provides easy-to-use data structures and tools for data analysis in Python. In this tutorial, we covered the basics of Pandas, including creating DataFrames, performing common operations, and calculating summary statistics. This should give you a good foundation to start exploring and analyzing your own datasets using Pandas!

Leave a Reply