Pandas

April 28, 2025 2 minutes read

1. Loading data to pandas

1import pandas as pd
2df = pd.read_csv('pokemon_data.csv')

incase csv file contains no header

1df = pd.read_csv('pokemon_data.csv', header=None)

You can also load data from files other than csv, such as txt, xlsx

1df_xlsx = pd.read_excel('pokemon_data.xlsx')
2df_txt = pd.read_csv('pokemon_data.txt', delimiter='\t') # tab delimiter

2. Reading data in pandas

Read headers using df.columns.
Read one/multiple columns: df['single_col'], df[['col1', 'col2', 'col3']]

Read rows

Iterate through all rows:

1for index, row in df.iterrows():
2    print(index, row['Name'])

read a subset of rows: df.iloc[0:4]

read rows satified specific conditions:

1df.loc[df['type'] == 'Grass']
2# can join multiple conditions using | & ~ (single symbol)

Read a specific location (cell): df.iloc[r, c]

3. Sorting data

Pandas supports sorting by multiple criteria in ascending/descending order by each criterio

1df.sort_values(['Type 1', 'HP'], ascending=[1, 0])

4. Modifying data

Add a new columns based on information from other columns

1df['Total'] = df['HP'] + df['Attack'] + df['Defense'] + df['Sp. Atk'] + df['Sp. Def'] + df['Speed']

you can use the dot syntax only if the column name is single word

drop columns: df = df.drop(columns=['Total'])
Build a new dataframe by selecting some columns from old one
df = df[cols[0:4] + [cols[-1]]+cols[4:12]]
Conditional Changes

1df.loc[df['Total'] > 500, ['Generation','Legendary']] = ['Test 1', 'Test 2']

5. Saving data (exporting to files)

Like loading data, you can export into desired format

1df.to_csv('modified.csv', index=False)
2df.to_excel('modified.xlsx', index=False)
3df.to_csv('modified.txt', index=False, sep='\t')

6. Filtering data

1new_df = df.loc[(df['Type 1'] == 'Grass') & (df['Type 2'] == 'Poison') & (df['HP'] > 70)]
2
3new_df.reset_index(drop=True, inplace=True)

6. Aggregate Statistics/ Groupby

groupby method return DataframeGroupBy object, often follow by aggregating methods like sum, mean, count

1df['count'] = 1  # add a supporting column for precisely counting
2df.groupby(['Type 1', 'Type 2']).count()['count']

7. Working with large amount of data

Pandas supports reading the file by chunk

1new_df = pd.DataFrame(columns=df.columns)
2
3for df in pd.read_csv('modified.csv', chunksize=5):
4    results = df.groupby(['Type 1']).count()
5    
6    new_df = pd.concat([new_df, results])