DATA EXPLORATION AND PREPRATION – APPLICATION
Data Exploration and Preparation play a critical role in the success of the learning algorithm, as we have seen in the Theory section. All of the mentioned tasks can be accomplished with Python and R.
Data Exploration, which includes Univariate or Bivariate analysis, has been discussed in the Application section. Univariate/Bivariate analysis uses the concepts of Descriptive Statistics only.
Data preparation includes some concepts called ‘Miscellaneous methods.’ These methods include Outliers, Consolidation of data, and Missing value treatment. Data Preparation also includes ‘Feature Engineering,’ where different operations are performed on data features to prepare them for use in creating Data Models.
For exploring the ‘Miscellaneous Methods,’ a bunch of hypothetical datasets was used. However, the Boston Dataset was used for certain aspects of Feature Engineering. Python has been used to execute the code.
MISCELLANEOUS METHODS IN PYTHON
This section is a collection of blogs on data preparation and exploration aspects. The methods for condensing a data set are examined as well as the different uni-variate and bi-variate analyses that serve as the basis for data discovery. Different methods to treat the data that has missing values and outliers are discussed in this part. These methods are fundamentally different in that the consolidate of data along with missing value treatment and outliers.
FEATURE ENGINEERING IN PYTHON
The different modifications were made to features from this section. The transformation of features and scaling are extensively discussed, together with understanding the causes and effects of such changes. Another aspect that is part of Feature Engineering deals with the reduction and creation of features. Methods of reduction involve the extraction of features and selection. Those involved in construction deal with the decomposition and generating new features.