Skip to content

mrithip/nashville-housing-data-cleaning-sql

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Nashville Housing Data Cleaning with SQL

This project demonstrates data cleaning techniques using SQL on Nashville housing dataset. The SQL scripts standardize dates, handle missing values, split addresses, remove duplicates, and prepare the data for analysis.

Dataset

The dataset is stored in housing.db, a SQLite database containing Nashville housing sales data.

Prerequisites

  • SQLite3 installed on your system
  • Basic knowledge of SQL

Installation

  1. Clone the repository:

    git clone https://github.com/mrithip/nashville-housing-data-cleaning-sql.git
    cd nashville-housing-data-cleaning-sql
  2. Ensure SQLite3 is installed:

    sqlite3 --version

Usage

Run the data cleaning script:

sqlite3 housing.db < datacleaning.sql

Data Cleaning Steps

The datacleaning.sql script performs the following operations:

  1. Table Management: Creates a backup and renames tables as needed
  2. Date Standardization: Converts sale dates to ISO format (YYYY-MM-DD)
  3. Missing Data Handling: Populates null property addresses using related records
  4. Address Parsing: Splits property addresses into separate address and city columns
  5. Owner Address Parsing: Splits owner addresses into address, city, and state columns
  6. Data Standardization: Converts "Y"/"N" values in SoldAsVacant to "Yes"/"No"
  7. Duplicate Removal: Identifies and removes duplicate records based on key fields
  8. Column Cleanup: Removes unnecessary columns (OwnerAddress, PropertyAddress, TaxDistrict)

Database Schema

After cleaning, the main table nashvillehousingdata contains standardized columns including:

  • UniqueID
  • ParcelID
  • LandUse
  • PropertyAddress (original)
  • SaleDate (ISO format)
  • SalePrice
  • LegalReference
  • SoldAsVacant
  • OwnerName
  • PropertySplitAddress
  • PropertySplitCity
  • OwnerSplitAddress
  • OwnerSplitCity
  • OwnerSplitState
  • And others...

Contributing

Feel free to submit issues and enhancement requests.

About

SQL-based data cleaning project for Nashville housing dataset, standardizing dates, addresses, and removing duplicates.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors