Python Automation Showcase
Data Integrity Validator
CLI workflow for validating, cleaning, and deduplicating foreclosure property and event feeds. Built for quick trust checks: strict rules, explicit rejection reasons, and reproducible outputs.
What It Validates
- Schema and required-column checks with case and whitespace tolerance
- APN pattern validation and cross-table foreign key integrity
- Canonical enum normalization for status, event type, and source
- Deterministic dedupe rules that keep newest valid records
- Clear rejection output with per-row
violation_reason
Fast Run
python3 -m venv venv
./venv/bin/pip install pandas
./venv/bin/python validator.py
Outputs: cleaned_properties.csv, cleaned_events.csv, rejected_rows.csv
Why This Exists
Foreclosure datasets often arrive from mixed-quality feeds. This tool isolates malformed rows, keeps clean records moving downstream, and leaves a review trail for manual correction.
Proof Angle
Demonstrates practical Python operations work: validation logic, defensive error handling, data-quality reporting, and disciplined output contracts.