Load, clean and explore data with Python Pandas

In machine learning, we need to be able to manipulate our data. And pandas is a good library for this task. To manipulate is to load, clean and explore our data aka Data Exploratory Analysis (EDA). This step is important in machine learning.

Load data

import pandas as pd 
dataset = pd.read_csv(r'C:\Users\CS\Desktop\FX\MajorExchangeRates.csv') 
print(dataset)    

Actual data is 2763 rows and due to space constraint only 10 rows are displayed. This data contained continous features.

DateUSDJPYGBPCHFAUDCNYIDRINRMYRSGDTHB
16/10/20191.1025119.90.86561.09971.63947.82715662.1178.7684.6251.513833.533
15/10/20191.1007119.230.870581.09771.62937.794315591.4278.6964.61251.508833.434
14/10/20191.1031119.40.879831.09831.63257.798815586.878.52454.61981.510333.529
11/10/20191.1043119.750.875181.10251.62467.841715601.5578.48754.6221.517733.642
10/10/20191.103118.520.901551.09481.63077.856715609.178.35554.62051.518733.537
09/10/20191.0981117.910.89851.09271.62827.826515560.0878.0344.60871.515333.311
08/10/20191.0986117.430.897951.08981.62867.847415550.6878.1444.6071.516833.403
07/10/20191.0993117.440.891551.09241.63167.858215579.8878.04354.60941.517533.476
04/10/20191.0979117.230.890451.09131.62477.849715531.3977.84154.59531.513933.437

Clean data

After loading, cleaning your data is a must.

Explore data

You can choose to display certain columns or rows.

#display 1st 5 rows of data by default.
print (dataset.head()) 

#you can input number of rows you want as an argument for the method.
#display 1st 12 rows of data.
print (dataset.head(12)) 

#display only USD, JPY and THB columns
print (dataset[['USD', 'JPY', 'THB']])

#display rows by row index. Print all columns at index 3(ie. 4th row)
print (dataset.loc[3 , :])
#will display in this format.. see below.

Date    11/10/2019
USD         1.1043
JPY         119.75
GBP        0.87518
CHF         1.1025
AUD         1.6246
CNY         7.8417
IDR        15601.5
INR        78.4875
MYR          4.622
SGD         1.5177
THB         33.642
Name: 3, dtype: object

Display columns and index names. The index is normally running integers therefore not much of a use.

index = dataset.index
column = dataset.columns
print(index)
print(column)

To remove certain columns from dataset. Inplace = True means the original dataset table will be replaced by this new table which the 2 columns are dropped. Put inplace = False if you want to give the new table another name.

#choose categories to drop
toDrop = ['JPY' , 'MYR']
# axis=1 means remove colum and inplace=True means dataset file will be overwritten.
dataset.drop(toDrop, axis = 1, inplace = True)
print (dataset)