SQL DATA CLEANING (SQL)

About this project

Through the Below SQL queries, the housing data was thoroughly cleaned, standardized, and prepared for further analysis or integration into other systems. The project demonstrates proficiency in data cleaning techniques using SQL and ensures the dataset's quality and consistency for downstream processes or analysis.

1 - Standardize Date Format:

  • Used SQL CONVERT function to standardize the date format in the SaleDate column.
  • Added a new column SaleDateConverted to store the standardized dates.

  • 2 - Populate Property Address Data:

  • Filled missing property addresses by joining the dataset with itself based on ParcelID and copying the non-null property addresses.

  • 3 - Break Address into Individual Columns:

  • Split the PropertyAddress column into separate columns for PropertyNumAddress, PropertyCity, and PropertyState.
  • Utilized SQL string manipulation functions such as SUBSTRING and CHARINDEX to extract the required information.

  • 4 - Break Owner Address into Individual Columns:

  • Similar to the property address, split the OwnerAddress column into OwnerNumAddress, OwnerCity, and OwnerState columns.
  • Used the PARSENAME function to parse the address components.

  • 5 - Change Y and N to Yes and No:

  • Converted 'Y' and 'N' values in the SoldAsVacant column to 'Yes' and 'No', respectively, for better readability.
  • Implemented using SQL UPDATE statements with conditional logic.

  • 6 - Remove Duplicates:

  • Eliminated duplicate records based on specific columns (ParcelID, PropertyAddress, SalePrice, SaleDate, LegalReference) using a common table expression (CTE) and the ROW_NUMBER() window function.
  • Deleted duplicate records, keeping only one occurrence of each unique combination.

  • 7 - Delete Unused Columns:

  • Removed unnecessary columns (SaleDate, PropertyAddress, OwnerAddress) from the dataset to streamline the data and improve clarity.
  • Employed the SQL ALTER TABLE statement with the DROP COLUMN clause to delete the specified columns.

  • Get In Touch