Python Automation Showcase

Data Integrity Validator

CLI workflow for validating, cleaning, and deduplicating foreclosure property and event feeds. Built for quick trust checks: strict rules, explicit rejection reasons, and reproducible outputs.

1 Runtime dependency
3 Output artifacts per run
FK + Dedupe Integrity enforcement built in

What It Validates

  • Schema and required-column checks with case and whitespace tolerance
  • APN pattern validation and cross-table foreign key integrity
  • Canonical enum normalization for status, event type, and source
  • Deterministic dedupe rules that keep newest valid records
  • Clear rejection output with per-row violation_reason

Fast Run

python3 -m venv venv
./venv/bin/pip install pandas
./venv/bin/python validator.py

Outputs: cleaned_properties.csv, cleaned_events.csv, rejected_rows.csv

Why This Exists

Foreclosure datasets often arrive from mixed-quality feeds. This tool isolates malformed rows, keeps clean records moving downstream, and leaves a review trail for manual correction.

Proof Angle

Demonstrates practical Python operations work: validation logic, defensive error handling, data-quality reporting, and disciplined output contracts.

Open Repository Read Full Docs