Data Preprocessing involves preparing raw data for analysis or modeling by cleaning, transforming, and organizing it into a suitable format.
Characteristics:
– Handling missing or inconsistent data by filling, removing, or correcting errors
– Normalizing or scaling numerical values to a common range
– Encoding categorical variables into numerical formats
– Removing duplicates and irrelevant information
– Splitting data into training, validation, and test sets
Examples:
– Filling missing age values in a customer dataset with the average age
– Converting text labels like “Yes” and “No” into binary values 1 and 0
– Scaling pixel values of images between 0 and 1 before feeding into a neural network


