Load, clean and explore data with Python Pandas

In machine learning, we need to be able to manipulate our data. And pandas is a good library for this task. To manipulate is to load, clean and explore our data aka Data Exploratory Analysis (EDA). This step is important in machine learning.

Load data

import pandas as pd 
dataset = pd.read_csv(r'C:\Users\CS\Desktop\FX\MajorExchangeRates.csv') 
print(dataset)

Actual data is 2763 rows and due to space constraint only 10 rows are displayed. This data contained continous features.

Date	USD	JPY	GBP	CHF	AUD	CNY	IDR	INR	MYR	SGD	THB
16/10/2019	1.1025	119.9	0.8656	1.0997	1.6394	7.827	15662.11	78.768	4.625	1.5138	33.533
15/10/2019	1.1007	119.23	0.87058	1.0977	1.6293	7.7943	15591.42	78.696	4.6125	1.5088	33.434
14/10/2019	1.1031	119.4	0.87983	1.0983	1.6325	7.7988	15586.8	78.5245	4.6198	1.5103	33.529
11/10/2019	1.1043	119.75	0.87518	1.1025	1.6246	7.8417	15601.55	78.4875	4.622	1.5177	33.642
10/10/2019	1.103	118.52	0.90155	1.0948	1.6307	7.8567	15609.1	78.3555	4.6205	1.5187	33.537
09/10/2019	1.0981	117.91	0.8985	1.0927	1.6282	7.8265	15560.08	78.034	4.6087	1.5153	33.311
08/10/2019	1.0986	117.43	0.89795	1.0898	1.6286	7.8474	15550.68	78.144	4.607	1.5168	33.403
07/10/2019	1.0993	117.44	0.89155	1.0924	1.6316	7.8582	15579.88	78.0435	4.6094	1.5175	33.476
04/10/2019	1.0979	117.23	0.89045	1.0913	1.6247	7.8497	15531.39	77.8415	4.5953	1.5139	33.437

Clean data

After loading, cleaning your data is a must.

Explore data

You can choose to display certain columns or rows.

#display 1st 5 rows of data by default.
print (dataset.head()) 

#you can input number of rows you want as an argument for the method.
#display 1st 12 rows of data.
print (dataset.head(12)) 

#display only USD, JPY and THB columns
print (dataset[['USD', 'JPY', 'THB']])

#display rows by row index. Print all columns at index 3(ie. 4th row)
print (dataset.loc[3 , :])
#will display in this format.. see below.

Date    11/10/2019
USD         1.1043
JPY         119.75
GBP        0.87518
CHF         1.1025
AUD         1.6246
CNY         7.8417
IDR        15601.5
INR        78.4875
MYR          4.622
SGD         1.5177
THB         33.642
Name: 3, dtype: object

Display columns and index names. The index is normally running integers therefore not much of a use.

index = dataset.index
column = dataset.columns
print(index)
print(column)

To remove certain columns from dataset. Inplace = True means the original dataset table will be replaced by this new table which the 2 columns are dropped. Put inplace = False if you want to give the new table another name.

#choose categories to drop
toDrop = ['JPY' , 'MYR']
# axis=1 means remove colum and inplace=True means dataset file will be overwritten.
dataset.drop(toDrop, axis = 1, inplace = True)
print (dataset)

Load data

Clean data

Explore data

Share this:

Related