ExactData

  • About
  • Applications
  • Contact
  • Data Blog
  • Partners
  • Resources
  • Sample Data
  • Smart Data
  • About
  • Applications
  • Contact
  • Data Blog
  • Partners
  • Resources
  • Sample Data
  • Smart Data

The Data Blog

Data Blog

The System Perspectives of Development and Tests

3/2/2020

0 Comments

 
A thought exercise on the System perspective of dev and test, as enabled by ExactData Synthetic Data.

Let’s consider the development of an application that scours incoming data for fraudulent activity… How would that test and analysis look with production data, de-identified production data, hand crafted data, and ExD synthetic data?

Let’s also consider that the application will classify all transactions/events as either bad or good.  The perfect application would classify every transaction correctly resulting in 100% Precision (everything classified as bad was actually bad), 100% capture rate (classified every actual bad as bad),  0% escape rate (no bads classified as good), and 0% False Positive rate (no goods classified as bad).  The application needs to be developed, tested, and analyzed from a System perspective.  For example, the application could classify every transaction as bad and achieve 100% capture rate, and 0% escape rate, but would also result in poor Precision and a huge False Positive rate – thus requiring significant labor support to adjudicate the classifications.  On the other extreme, the application could classify everything good, be mostly right, and not catch any bads.  Both of these boundary conditions are absurd but illustrate the point of the importance of System.
One method of System analysis is the Confusion Matrix, noted below.
Picture
With production data, you don’t know where the bads are, so you can’t complete the confusion matrix.

With de-identified production data, you don’t know where the bads are, so you can’t complete the confusion matrix.
​

With hand-crafted data, you might have the “truth” to enable completion of the confusion matrix, you would not have the complexity or volume to be truly testing to find the “needle” in the haystack of fraudulent behavior within mass of good behavior.
With ExD synthetic data, you know where every bad is (you have the ground truth), so you CAN complete all 4 quadrants of the confusion matrix, and can then only, conduct a system analysis, driving the application to the real goal of tuning and optimizing Precision (maximizing TP) and Capture rate (maximizing TP/TP+FN)  , while at the same time minimizing Escapes (FN) and False Positive rate (FP/FP+TP). Within a particular setup of an application version, these are typically threshold trade-offs, but with next iteration development, there is the opportunity to improve on all scores. 
0 Comments



Leave a Reply.

    Archives

    March 2022
    November 2021
    October 2021
    September 2021
    August 2021
    July 2021
    June 2021
    April 2021
    March 2021
    February 2021
    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    July 2020
    June 2020
    May 2020
    April 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    March 2019
    February 2019

    Categories

    All
    Artificial Data
    Cyber Data
    Interview
    Other
    Smart Data

    RSS Feed

    Data Blog

Questions? Contact us today, we'd love to hear from you!


Hours

M-F: 9am - 5pm

Email

support@exactdata.net