Characteristics
– Involves identifying and correcting errors or inconsistencies in data
– Removes duplicate, incomplete, or irrelevant data entries
– Ensures data quality and accuracy for analysis or machine learning models
– Can include handling missing values, fixing typos, and standardizing formats
Examples
– Removing duplicate customer records from a database
– Correcting misspelled words in a text dataset
– Filling in missing values in a spreadsheet with average or median values
– Converting dates into a consistent format across a dataset