What are some good testing strategies when it comes to data warehousing?
What tools should be used or are recommended?
Any recommended favorite resources?
I'm currently trying to improve our testing and data validation within our data warehouse. We are currently running SSIS and we're breaking into creating automated ETL/ELT with Python. At this point, I'm heading in the direction of migrating our ETL/ELT processes to Python backed by unit testing.
In addition, we're working on creating Python scripts to execute mapped record testing from source to target. I'm not sure if we should be trying to test every data element every day, or doing some statistically significant tests on subsets.