In machine learning, we need to be able to manipulate our data. And pandas is a good library for this task. To manipulate is to load, clean and explore our data aka Data Exploratory Analysis (EDA). This step is important in machine learning.
Load data
import pandas as pd
dataset = pd.read_csv(r'C:\Users\CS\Desktop\FX\MajorExchangeRates.csv')
print(dataset)
Actual data is 2763 rows and due to space constraint only 10 rows are displayed. This data contained continous features.
Date | USD | JPY | GBP | CHF | AUD | CNY | IDR | INR | MYR | SGD | THB |
16/10/2019 | 1.1025 | 119.9 | 0.8656 | 1.0997 | 1.6394 | 7.827 | 15662.11 | 78.768 | 4.625 | 1.5138 | 33.533 |
15/10/2019 | 1.1007 | 119.23 | 0.87058 | 1.0977 | 1.6293 | 7.7943 | 15591.42 | 78.696 | 4.6125 | 1.5088 | 33.434 |
14/10/2019 | 1.1031 | 119.4 | 0.87983 | 1.0983 | 1.6325 | 7.7988 | 15586.8 | 78.5245 | 4.6198 | 1.5103 | 33.529 |
11/10/2019 | 1.1043 | 119.75 | 0.87518 | 1.1025 | 1.6246 | 7.8417 | 15601.55 | 78.4875 | 4.622 | 1.5177 | 33.642 |
10/10/2019 | 1.103 | 118.52 | 0.90155 | 1.0948 | 1.6307 | 7.8567 | 15609.1 | 78.3555 | 4.6205 | 1.5187 | 33.537 |
09/10/2019 | 1.0981 | 117.91 | 0.8985 | 1.0927 | 1.6282 | 7.8265 | 15560.08 | 78.034 | 4.6087 | 1.5153 | 33.311 |
08/10/2019 | 1.0986 | 117.43 | 0.89795 | 1.0898 | 1.6286 | 7.8474 | 15550.68 | 78.144 | 4.607 | 1.5168 | 33.403 |
07/10/2019 | 1.0993 | 117.44 | 0.89155 | 1.0924 | 1.6316 | 7.8582 | 15579.88 | 78.0435 | 4.6094 | 1.5175 | 33.476 |
04/10/2019 | 1.0979 | 117.23 | 0.89045 | 1.0913 | 1.6247 | 7.8497 | 15531.39 | 77.8415 | 4.5953 | 1.5139 | 33.437 |
Clean data
After loading, cleaning your data is a must.
Explore data
You can choose to display certain columns or rows.
#display 1st 5 rows of data by default.
print (dataset.head())
#you can input number of rows you want as an argument for the method.
#display 1st 12 rows of data.
print (dataset.head(12))
#display only USD, JPY and THB columns
print (dataset[['USD', 'JPY', 'THB']])
#display rows by row index. Print all columns at index 3(ie. 4th row)
print (dataset.loc[3 , :])
#will display in this format.. see below.
Date 11/10/2019 USD 1.1043 JPY 119.75 GBP 0.87518 CHF 1.1025 AUD 1.6246 CNY 7.8417 IDR 15601.5 INR 78.4875 MYR 4.622 SGD 1.5177 THB 33.642 Name: 3, dtype: object
Display columns and index names. The index is normally running integers therefore not much of a use.
index = dataset.index
column = dataset.columns
print(index)
print(column)
To remove certain columns from dataset. Inplace = True means the original dataset table will be replaced by this new table which the 2 columns are dropped. Put inplace = False if you want to give the new table another name.
#choose categories to drop
toDrop = ['JPY' , 'MYR']
# axis=1 means remove colum and inplace=True means dataset file will be overwritten.
dataset.drop(toDrop, axis = 1, inplace = True)
print (dataset)