A very interesting application of high-fidelity synthetic data generation techniques is to reduce credit card fraud. By 2025, the global losses to credit card fraud are expected to reach almost $50 billion. Detecting fraudulent transactions in a large data-set poses a problem because they are such a small percentage of the overall transactions. Banks and financial institutions are in need of a solution that can correctly identify both fraudulent and non-fraudulent transactions, and detect false/true negatives and false/true positives, enabling the creation of receiver operating curves and tuning the system to optimize for the cost to correct the fraud payment versus the cost of the payment. High fidelity synthetic data solves this dilemma by generating volumes of non-fraudulent transactions while interweaving complex fraud patterns into a very small subset of the overall transactions. The fraud patterns are known, enabling the credit card fraud detection system to be optimized.
Most applications testing, both performance and in development environments, is being done today utilizing production data that has been extracted utilizing an ETL (Extract Transform Load) process and then manually modified to create specific use cases. For example for cyber applications, most testing is being done by replaying network traffic. Due to the labor intensity of this process, use case coverage is generally very low and most of the business logic and workflow rules go untested. This is where the concept of sufficiently complex data comes in. Test data should be of large enough volumes to cover peak processing volumes and have sufficient complexity to cover almost all of the business logic and workflow rules. Utilizing large amounts of sufficiently complex test data will exercise algorithms at peak processing volumes to expose failures before moving to the production environment and enable precision error measurement for ambiguous, true and false errors. Systems can then be optimized for the cost of errors versus the cost to correct.
What is ExactData? What do we do? Why is it important, and how can we help you? These are some of the many questions we would like to answer to give a little more insight about how we operate.
ExactData is based in Rochester, New York and we specialize in automating the generation of large, fully artificial, engineered test data for enhanced performance yet quicker results. Our data eliminates security and privacy risks and uses no personal information whatsoever when generating artificial test purpose data making it completely safe to use on top of being unique and optimized per each situation. The creation and advancement of simulated data is unique yet up and coming, and we strive to improve our product everyday.
Our engineers have recently created a script that will inject synthetic data that simulates ADAMS data into a file format that can be consumed by commercial network traffic generators. ADAMS data is simulated data for insider threat detection systems based on anomalies in massive data-sets. Data domains include Logon, Device, HTTP, Email, File, Print, LDAP, Organization Directory, Decoy files, and Psychometric files. Why all of the excitement? The current state of the art network traffic generation tools are using very simplistic content that are not designed for the system under test. Once this integration is complete, cyber security testing can be taken to a while new level where sophisticated threat patterns are interwoven into data and consumed by the network. This will enable sophisticated testing of the network's intrusion detection and measurement of true and false positive errors, so these systems can be optimized for cost and risk performance. This alone is a huge leap in the cyber security industry, and we will only continue to move forward with our advancements in the world of technology.
High interest helping to implement Cyber Behavioral Tools was expressed by many potential clients, including the Cyber Innovation Manager from one of the world's largest banks, a Divisional Chief Information Security Officer for one of the biggest US Federal Systems Integrator's and one of the largest Cyber Independent Testing Laboratories. During the demonstrations large amounts of internally consistent data was generated for all desired behaviors. Data was generated over any time-frame to output: