Pandas moduls

June 14, 2021

Pandas moduls

In this post lets see about pandas which is the most important tool in data science written top of the numpy. Panda is used to clean, manipulate and analysis the data. It has inbuilt visualisation techniques which is used to plot graphs . It also used to create a frame table with rows and columns.

Application of pandas:

1. Pandas is used in statistics and neuralscience

Panda is something used to show data in form of tables .Pandas mainly consist of two function series and DataFrame.

To install pandas:(In your cmd)

pip install pandas

What is Series?

Series is same as table it consists of 4 parameters data, index, dtype, name, copy, fastpath. Make sure you use S capital in Series to avoid errors. To print series follow. np.nan is null value

import numpy as np

import pandas as pd

x = pd.Series(['A','B','C',np.nan,'D'])

print(x)

output:

0 A 1 B 2 C 3 NaN 4 D

DataFrame in pandas:

import numpy as np

import pandas as pd

x= ['A','B','C','D','E','F','G','H','I','J']

y = [1,2,3,4,5,6,7,8,9,10]

df = pd.DataFrame(data=x,index=y,columns=["i"])

print(df)

output:

i 1 A 2 B 3 C 4 D 5 E 6 F 7 G 8 H 9 I 10 J

To print Date range using pandas:

import pandas as pd

d = pd.date_range('20210601',periods=10)

print(d)

output:

DatetimeIndex(['2021-06-01', '2021-06-02', '2021-06-03', '2021-06-04', '2021-06-05', '2021-06-06', '2021-06-07', '2021-06-08', '2021-06-09', '2021-06-10'], dtype='datetime64[ns]', freq='D')

Convertion of dictionary to DataFrame:

import numpy as np

import pandas as pd

print('---------------------------------')

cf=pd.DataFrame({'A':[1,2,3,4],

                 'B':['a','b','c','d'],

                 'c':pd.Series('ii',index=(range(4))),

                 'D':np.array([5]*4),

                 'E':'techlanguage'})

print(cf)

output:

A B c D E 0 1 a ii 5 techlanguage 1 2 b ii 5 techlanguage 2 3 c ii 5 techlanguage 3 4 d ii 5 techlanguage

How to view a data in pandas?

Lets create a dataframe first from that lets see how to view a data

import numpy as np

import pandas as pd

d = pd.date_range('20210601',periods=10)

print('----------------------------------')

cf=pd.DataFrame(np.random.randn(10,4),index=d,columns=['A','B','C','D'])

print(cf)

output

----------------------------------------------- A B C D 2021-06-01 0.196218 0.692613 -0.516274 -1.166615 2021-06-02 -0.757079 -0.875631 1.412703 1.272985 2021-06-03 0.734642 -0.081268 -0.017365 0.635054 2021-06-04 -0.021272 -1.020569 0.268014 0.750294 2021-06-05 1.154764 0.216445 0.916505 -1.017789 2021-06-06 0.778009 -1.151132 1.177456 -0.464986 2021-06-07 -0.051266 -1.401961 -1.232411 -0.552707 2021-06-08 -0.740522 -0.052965 -0.124883 -0.009026 2021-06-09 0.390379 1.128178 -0.793724 1.340582 2021-06-10 -1.101693 -0.365774 -1.188283 -0.644097

head() and tail() :

Used to print the first five elements and last five elements

print('---------------------------------')

print(cf.head())

print('---------------------------------')

print(cf.tail())

output:

----------------------------------------------- A B C D 2021-06-01 -0.352654 -1.545496 0.241450 -0.718301 2021-06-02 0.145778 -0.193142 1.254713 1.296125 2021-06-03 -2.447800 -0.231696 0.881387 1.253856 2021-06-04 0.642593 0.915895 0.483134 -0.620165 2021-06-05 -0.389417 0.464068 -0.950464 1.234421 ----------------------------------------------- A B C D 2021-06-06 0.001256 0.633361 1.955822 1.491964 2021-06-07 3.114453 0.284696 -0.694232 -1.950772 2021-06-08 0.124894 0.805616 0.534558 -0.415105 2021-06-09 0.814426 -0.831956 0.103323 -0.487111 2021-06-10 -0.483753 -1.583053 -0.958189 0.255926

index:

Used to print all the index value

 print(cf.index)

 output

DatetimeIndex(['2021-06-01', '2021-06-02',

'2021-06-03', '2021-06-04', '2021-06-05', '2021-06-06',

'2021-06-07', '2021-06-08', '2021-06-09', '2021-06-10'], dtype='datetime64[ns]', freq='D')

Columns:

Used to print all column names

 print('--------------------------------')
 print(cf.columns)

 output

 ----------------------------------------
 Index(['A', 'B', 'C', 'D'], dtype='object')

Describe():

Used to show mean, max, min ect

 print('---------------------------------')
 print(cf.describe())

 output

 -----------------------------------------------
               A          B          C          D
 count  10.000000  10.000000  10.000000  10.000000
 mean    0.062032  -0.376059  -0.004030   0.161761
 std     0.746521   0.777607   0.913673   1.110062
 min    -0.961250  -1.373856  -1.960989  -2.219939
 25%    -0.583649  -0.725754  -0.403223  -0.289852
 50%     0.169691  -0.464967   0.125751   0.443269
 75%     0.517888  -0.179336   0.684998   0.831990
 max     1.119663   1.432168   1.092677   1.559623

Sorting:

Used to sort based on the index and values

 print(cf.sort_index(axis=1,ascending=False))
 print('---------------------------------')
 print(cf.sort_values(by='A'))

 output

                D         C         B         A
 2021-06-01 -0.015767 -0.477356  0.906313  0.740205
 2021-06-02 -0.328280 -0.558549 -1.070898 -0.701623
 2021-06-03 -0.214263  0.835058 -1.905962  0.078338
 2021-06-04  0.417488 -0.433438  0.575620  0.516499
 2021-06-05  1.407020 -1.363445  1.755119 -0.709050
 2021-06-06  2.684388 -1.008185 -0.156115  0.096735
 2021-06-07  0.358821 -0.096272 -0.971703  0.204794
 2021-06-08  1.111937 -0.706687 -1.402163 -1.127931
 2021-06-09 -1.100631 -0.308159 -0.263300  1.796101
 2021-06-10 -1.529672  0.703853 -0.820730 -0.542098
 -----------------------------------------------
                    A         B         C         D
 2021-06-08 -1.127931 -1.402163 -0.706687  1.111937
 2021-06-05 -0.709050  1.755119 -1.363445  1.407020
 2021-06-02 -0.701623 -1.070898 -0.558549 -0.328280
 2021-06-10 -0.542098 -0.820730  0.703853 -1.529672
 2021-06-03  0.078338 -1.905962  0.835058 -0.214263
 2021-06-06  0.096735 -0.156115 -1.008185  2.684388
 2021-06-07  0.204794 -0.971703 -0.096272  0.358821
 2021-06-04  0.516499  0.575620 -0.433438  0.417488
 2021-06-01  0.740205  0.906313 -0.477356 -0.015767
 2021-06-09  1.796101 -0.263300 -0.308159 -1.100631

In next lets see about the slicing in pandas. Hope you understand feel free to comment!!!!

Search This Blog

Tech language

Featured

Write a Python Program to find palindrome of a given string

Pandas moduls

Comments

Post a Comment

Popular Posts

Search in python

Numpy-2