Benchmarking Algorithms for (data) Repairing and (data) Translation
BART is an error-generation tool for data cleaning applications. Its purpose is to introduce errors into clean databases for the purpose of benchmarking data-repairing algorithms. It provides users with the highest possible level of control over the error-generation process, and at the same time scales nicely to large databases. This is far from trivial, since, as we show in our technical papers, the error-generation problem is surprisingly challenging, and in fact, NP-complete. To scale to millions of tuples, the system relies on several non-trivial optimizations, including a new symmetry property of data quality constraints.
2 results
2016 | |
[VLDB-2016] | Messing Up with Bart: Error Generation for Evaluating Data-Cleaning Algorithms ( ), volume 9, 2016. (To appear in proceedings of the Proceedings of the VLDB Endowment) |
2015 | |
[TR-01-2015] | Error Generation for Evaluating Data-Cleaning Algorithms ( ), Technical report, , 2015. (Technical Report) |