Cyberithub

Python Pandas Tutorial: Series and Data Frame Explained with Best Examples

Advertisements

Python Pandas is a strong and widely used python library. It is intensively used in many fields and data analysis is one of such field were Pandas python library  plays crucial role for manipulating, cleaning, analyzing, updating and editing data sets. One can refer to the Pandas source code available on git repository

Install Pandas in PyCharm

Follow below steps in Pycharm IDE to install the module:-

  • Go to File -> settings
  • select project where you want to install Pandas library
  • select project interpreter
  • click on '+' symbol at extreme right side
  • search for the library
  • click on 'Install Package'

Python Pandas Tutorial: Series and Data Frame Explained with Best Examples

Python Pandas Tutorial

Also Read: Best Explanation of Python File I/O(Input/Output) with Examples

Import Pandas

Pandas library can be imported using keyword 'import'. we can import in 2 ways:-

import pandas

or

import pandas as pd(alias)

Advertisements

Few Pandas concepts

1. Series

Pandas Series is a special type of list, also called as 1D array which can holds any kind  of data. It indexes(also called labels) the data by  default starting from 0. We can access  the data from Series using the indexes.

Example

import pandas
data = [1, 'A', '*']
print(pandas.Series(data))

Output

0    1
1    A
2    *
dtype: object

We can also create indexes according to our need.

Example

import pandas
data = [1, 'A', '*']
print(pandas.Series(data, index=['a', 'b', 'c']))

Output

a    1
b    A
c    *
dtype: object

Accessing elements at different indexes.

Example

import pandas
data = [1, 'A', '*']
indexed_data = (pandas.Series(data, index=['a', 'b', 'c']))
print(f"Element at 'a': {indexed_data['a']}")
print(f"Element at 'b': {indexed_data['b']}")
print(f"Element at 'c': {indexed_data['c']}")

Output

Element at 'a': 1

Element at 'b': A

Element at 'c': *

 

2. DataFrame

DataFrame in pandas is a 2-D array which can hold heterogeneous type of data. It gets created with labelled axes (i.e with rows and columns).

In the below example, we will create a data frame and see some of it's important functions which are quite helpful when dealing with tabular data.

Example

import pandas

dframe = pandas.DataFrame({'NickNames':['Green City', 'Golden City', 'Yoga City'],  #Create Data Frame
'States':['Gandhi Nagar', 'Amritsar', 'Rishikesh'],
'Delicacies':['Pizza', 'Kulcha', 'Samosa'],
'Rating':['4.5', '4', '4.6']
})

print(f"Data Frame:\n {dframe}\n")
inx = dframe.index = ['i', 'ii', 'iii']       #Set index of Data frame. default indexing starts from 0
print(f"Index of Data Frame:\n {dframe}\n")
print(f"NickNames are:\n {dframe['NickNames']}\n")   #print All values of variable NickNames
print(f"NickNames along with Data Frame:\n {dframe[['NickNames']]}\n")         #print Data Frame
print(f"NickNames and States are:\n {dframe[['NickNames', 'States']]}\n") 
print(f"Second and Third row:\n {dframe[1:3]}\n")  #slice Data Frame(another way to print specific number of rows)
print(f"First Data set:\n {dframe.loc[['i']]}\n")   #loc uses string indices to print certain row(here print 1st row)
print(f"Second Data set:\n {dframe.iloc[1]}\n")    #iloc uses integer indices to print certain row(here print 2nd row)

Output

Data Frame:

   NickNames          States            Delicacies  Rating

0   Green City        Gandhi Nagar      Pizza         4.5

1   Golden City        Amritsar            Kulcha        4

2    Yoga City          Rishikesh           Samosa      4.6


Index of Data Frame:

    NickNames           States          Delicacies   Rating

i     Green City      Gandhi Nagar      Pizza          4.5

ii   Golden City       Amritsar            Kulcha          4

iii    Yoga City        Rishikesh           Samosa        4.6




NickNames are:

i       Green City

ii     Golden City

iii      Yoga City

Name: NickNames, dtype: object




NickNames along with Data Frame:

NickNames

i     Green City

ii   Golden City

iii    Yoga City




NickNames and States are:

NickNames        States

i     Green City      Gandhi Nagar

ii   Golden City      Amritsar

iii    Yoga City        Rishikesh




Second and Third row:

NickNames     States Delicacies Rating

ii   Golden City   Amritsar     Kulcha      4

iii    Yoga City  Rishikesh     Samosa    4.6




First Data set:

NickNames         States           Delicacies   Rating

i  Green City       Gandhi Nagar      Pizza         4.5




Second Data set:

NickNames     Golden City

States           Amritsar

Delicacies         Kulcha

Rating                  4

Name: ii, dtype: object

3. Read CSV

When we deal with huge size of data (say employees of an MNC), it's quite impossible to create a data frame and perform different operations on them.

Hence, we store such data in some kind of file like csv, json etc. We can then simply export these files and perform any operation based on our needs.

For below example, I have downloaded a data set Future50.csv from kaggle.com. Let’s see the example below.

Example

import pandas as pd

restaurant = pd.read_csv('Future50.csv')
print(f"Data Set: {restaurant}")  #print data set
print(f"Columns in Data set: {restaurant.columns}")    #print all columns in data set
print(f"First two columns in Data set{restaurant.columns[0:2]}")  #print first 2 columns in data set
print(f"Data types of Columns: {restaurant.dtypes}", "\n")          #print data type of each column
print(f"Data Type of Sales: {restaurant['Sales'].dtypes}")            #print data type of a specific column
print(f"Shape of Data Set{restaurant.shape}")                #print shape of data set (50 rows, 9 columns)
print(f"Number of rows: {restaurant.shape[0]}")                     #print number of rows
print(f"Number of Columns: {restaurant.shape[1]}")                  #print number of columns
print("\n")
print(f"First 5 rows of Data Set{restaurant.head()}")              #by default prints first 5 rows of data set
print(f"First 3 rows of Data set{restaurant.head(3)}")              #prints first 3 rows of data set
print(f"Last five rows of Data set{restaurant.tail()}")               #by default prints last 5 rows of data set
print(f"Last two rows of Data set{restaurant.tail(2)}")             #prints last 2 rows of data set
print("\n")
print(f"Unique values: {restaurant['Sales'].unique()}")            #prints all unique values in a specific column
print(f"Number of unique values: {restaurant['Sales'].nunique()}")    #prints total number of unique values in a specific column

Output

Columns in Data set: Index(['Rank', 'Restaurant', 'Location', 'Sales', 'YOY_Sales', 'Units',

'YOY_Units', 'Unit_Volume', 'Franchising'],

dtype='object')

First two columns in Data setIndex(['Rank', 'Restaurant'], dtype='object')

Data types of Columns: Rank            int64

Restaurant     object

Location       object

Sales           int64

YOY_Sales      object

Units           int64

YOY_Units      object

Unit_Volume     int64

Franchising    object

dtype: object




Data Type of Sales: int64

Shape of Data Set(50, 9)

Number of rows: 50

Number of Columns: 9

First 5 rows of Data Set   Rank   Restaurant  ... Unit_Volume  Franchising

0     1   Evergreens  ...        1150           No

1     2  Clean Juice  ...         560          Yes

2     3     Slapfish  ...        1370          Yes

3     4   Clean Eatz  ...         685          Yes

4     5    Pokeworks  ...        1210          Yes




[5 rows x 9 columns]

First 3 rows of Data set   Rank   Restaurant  ... Unit_Volume  Franchising

0     1   Evergreens  ...        1150           No

1     2  Clean Juice  ...         560          Yes

2     3     Slapfish  ...        1370          Yes




[3 rows x 9 columns]

Last five rows of Data set    Rank                        Restaurant  ... Unit_Volume  Franchising

45    46                       LA Crawfish  ...        2050          Yes

46    47                            &pizza  ...        1350           No

47    48               Super Duper Burgers  ...        2630           No

48    49                   StoneFire Grill  ...        2550           No

49    50  Gus's World Famous Fried Chicken  ...        1600          Yes




[5 rows x 9 columns]

Last two rows of Data set    Rank                        Restaurant  ... Unit_Volume  Franchising

48    49                   StoneFire Grill  ...        2550           No

49    50  Gus's World Famous Fried Chicken  ...        1600          Yes

[2 rows x 9 columns]

Unique values: [24 44 21 25 49 39 20 29 30 41 48 37 22 32 23 47 28 27 42 40 38 45 31]

Number of unique values: 23

Process finished with exit code 0

Leave a Comment