The Data Blog |
|
Error rates in AI processing systems are a significant concern for anyone who relies on these systems, especially in critical applications like healthcare, finance, and safety. These include individuals, businesses, organizations, and even regulatory bodies.
How We Currently Measure Errors (and Why It’s Flawed) Error rates in AI processing systems are measured by comparing the system’s output with a ground truth or reference. The current method is to collect a representative sample of data and then manually create and determine the expected outcomes (ground truth). This is problematic for a couple of reasons: people are notoriously bad at auditing large data sets while creating the necessary true/false positives and negative use cases and it is almost impossible to get a large enough data set for statistical representation. A Better Way: Fully Synthetic Data An alternative is to use a technical approach of manufacturing the synthetic data using rules based algorithms, generating a universe of correlated data sets with the desire characteristics. The computer will act as a tireless book keeper and generate vast amounts of data sets with statistically valid variation for all of the necessary happy path and true/false positive and negative use cases. Why Other Technical Approaches Don’t Work Any other technical approach, such as the use of AI to generate the test data or Extract Transform Load technologies does not solve the problem of establishing the expected outcome (ground truth) because the expected outcome (ground truth) or the data sets used to train the AI system or the original data set used in the Extract Transform Load process is unknown.
0 Comments
Leave a Reply. |
Archives
April 2025
Categories
All
Data Blog |
RSS Feed