ETL (Extract, transform, load) is a vital part of data warehouse and changes in data should be validated on every step of data pipeline to meet business requirements. Defining tests and validating data is a part of ETL development and executing those tests should continue after development as an automated process. Testing ETL processes should be systematic, as data and database architecture changes over time. It is common to encounter errors that are caused by source database such as changed data types, changes in column and table names, human error/data entry and changes in business rules. Tare allows to:
- integrate automated data validation to ETL via REST API
- schedule automated tests to check for possible errors after/before daily loading and transformation
- perform data reconciliation in data pipeline
Data Migration Testing
Data reconciliation is a part of data warehouse testing and should be done by comparing datasets one-to-one and verifying aggregated values on both sides. Sampling and manually comparing data are not viable methods for testing large datasets as it covers only a fraction of the data. With Tare, it is possible to compare millions of rows of data row by row for all supported platforms to find missing data, inconsistencies and test data migration.
Sudden changes, deviations and variations in values often indicate errors in data. It is important to know the integrity of your data and how it changes over time. Tare allows to detect errors by systematically profiling objects and analysing results to find anomalies. We use machine learning to predict the changes in values, find possible errors and propose new test cases.
Defining new business rules and developing ETL means validating new data and creating test cases. With every new development, data has to be compliant with existing business rules and requirements. Tare allows to save such test cases, use them in different test packages, and execute them within new developments to validate the data. It is possible to group tests by component (process, table, column, script etc) to validate individual units of the database. With Tare you can schedule your regression tests to run as often as needed, making sure your data is always as expected.