The Data Blog | ExactData - ExactData

Test Data & Cyber Security: Complex Dynamic Payloads

4/3/2020

The Next Step in the Evolutionary Cyber Security Ladder; Complex Dynamic
Payloads with High Fidelity Content and Relational Scenarios

Commercial network traffic generation technologies such as Ixia BreakingPoint or Spirent simulate real-world legitimate traffic, distributed denial of service (DDoS), exploits, malware, and fuzzing. These technologies help to test and validate an organization’s security infrastructure.

Today, advanced behavior-based threats are growing more sophisticated, harder to detect, and are accelerating rapidly. Current networks are becoming even more vulnerable to these rapidly growing
threats that cost more than $4B annually in the US alone. Detecting and mitigating Advanced Persistent Threats and Insider Threats demand far more advanced testing techniques, analytics, and sophisticated data sets for consistent detection, demonstration, measurement, and mitigation.

Today, you can combine commercial network traffic and synthetic data generation technologies to
provide rich content that mirrors real-world network traffic with configurable threat patterns contained within the traffic data. This end-to-end solution generates the behavioral network traffic test data as well as the system response files, enabling immediate scoring and correction of systems errors. This is a huge advancement in this critical and growing segment of sophisticated threat-based network testing.

Test Data & Cyber Security: Data Breaches

3/13/2020

60 percent of breaches are linked to a third party. Why are you giving them access to your data when you don't need too?
Third-party contractors are the biggest source of security incidents outside of a company’s employees:

Approximately 66% of companies extensively or significantly rely on third-party vendors.
Less than 17% of organizations felt their current systems effectively managed third-party risk.
81% of organizations have seen an increase in third-party vendors in the past two years.
57% of organizations don't have an inventory of all 3rd parties.
On average, 181 vendors are granted access to a company’s network in a given week.

2019 was a huge year for cyber breaches, especially third-party cyber breaches (third-party breaches account for over half of all data breaches in the US, according to the Ponemon Institute). Plus, a third-party breach costs twice what a normal breach cost.

Why are commercial companies and government agencies giving access to their private and confidential data to third parties when there exist viable technology alternatives to this practice and they don't need too?

The System Perspectives of Development and Tests

3/2/2020

A thought exercise on the System perspective of dev and test, as enabled by ExactData Synthetic Data.

Let’s consider the development of an application that scours incoming data for fraudulent activity… How would that test and analysis look with production data, de-identified production data, hand crafted data, and ExD synthetic data?

Let’s also consider that the application will classify all transactions/events as either bad or good. The perfect application would classify every transaction correctly resulting in 100% Precision (everything classified as bad was actually bad), 100% capture rate (classified every actual bad as bad), 0% escape rate (no bads classified as good), and 0% False Positive rate (no goods classified as bad). The application needs to be developed, tested, and analyzed from a System perspective. For example, the application could classify every transaction as bad and achieve 100% capture rate, and 0% escape rate, but would also result in poor Precision and a huge False Positive rate – thus requiring significant labor support to adjudicate the classifications. On the other extreme, the application could classify everything good, be mostly right, and not catch any bads. Both of these boundary conditions are absurd but illustrate the point of the importance of System.
One method of System analysis is the Confusion Matrix, noted below.

With production data, you don’t know where the bads are, so you can’t complete the confusion matrix.

With de-identified production data, you don’t know where the bads are, so you can’t complete the confusion matrix.

With hand-crafted data, you might have the “truth” to enable completion of the confusion matrix, you would not have the complexity or volume to be truly testing to find the “needle” in the haystack of fraudulent behavior within mass of good behavior.
With ExD synthetic data, you know where every bad is (you have the ground truth), so you CAN complete all 4 quadrants of the confusion matrix, and can then only, conduct a system analysis, driving the application to the real goal of tuning and optimizing Precision (maximizing TP) and Capture rate (maximizing TP/TP+FN) , while at the same time minimizing Escapes (FN) and False Positive rate (FP/FP+TP). Within a particular setup of an application version, these are typically threshold trade-offs, but with next iteration development, there is the opportunity to improve on all scores.

Privacy within the Cyber World

2/14/2020

One of the major debates about the cyber world pertains to privacy and how much of it one really has. With companies such as Facebook, Google, and Apple using customer data more and more, we know that our once private lives may not be as secret as we think. In their book 'Who Knows; Safeguarding Your Privacy in a Networked World', authors Ann Cavoukian and Don Tapscott discuss how secure certain documents that are supposed to remain private such as medical records and employment history really are.

The truth is with advancements in technology and cyber security, privacy and one's data is more readily available to companies, hackers, and even other ordinary individuals. Some companies make the point that they only use consumer data for our benefit, showing products one may need before they know they need them or memorizing GPS routes if one travels that way constantly. While it's true that our daily lives are significantly improved because our data is being used in this manner, there one question we have to ask ourselves; is having less privacy a fair trade-off for potential everyday life benefts?

One thing is for certain; we should all take measures to make sure our data is as safe as it can be. Limiting who has access to your social media profiles and information, using secure passwords, and not clicking sketchy links are very easy tips and tricks to always keep in mind to make sure your data is really yours.

Test Data & Cyber Security: An Introduction

2/7/2020

What Test Data is Being Used Today and by Who?

An organization’s development ecosystem, including technical partners, software development and
contractors have a growing need to access private and confidential data to do their jobs. Relevant data sources are a necessary component of the software development, technical integration, testing, implementation and ongoing operations and maintenance processes and production data sources are commonly accessed and modified for this purpose. Complex, integrated technology solutions can no longer be managed within an organization’s internal operations but requires a large and varied global ecosystem of partners, consultants, technology companies and contractors. There is a similar need for test data within the organizations cyber security operations. It is also common practice for this ecosystem to utilize historical data, captured network traffic and simple network traffic generation technology for testing purposes.

The Dark Secret at the Heart of AI

11/29/2019

Most scientists agree that no one really knows how the most advanced algorithms do what they do, nor how well they are doing it. That could be a problem. Advances in synthetic data generation technologies can help. These algorithms generate data with a known ground truth, sufficient volumes and with statistically relevant true and false positives (TP, FP) and true and false negatives (TN, FN) for the nature of the test. AI algorithms can now be measured for precision, c, as the fraction of the predicted matches that are true positive matches, or c = TP/(TP + FP).

Ransomware Attacks in the News

10/18/2019

In recent news, Pitney Bowes and Groupe M6 experienced ransomware attacks which limited customer access to company services and led to the encryption of information on private networks and systems belonging to the companies. Furthermore, email servers and phone lines also went down due to the attacks, and while no customer data was lost or stolen, shows how much of a threat these ransomware attacks can pose on the privacy of companies and their customers.

Ransomware attacks, while hard to detect and fight off, are able to be defeated with time and effort. However, if it takes too much time to defeat said attacks, valuable data could be breached or stolen and many will be put at risk. If the risk is too much, companies forego hopes of fighting off the attacks themselves and end up paying high extortion fees to minimize damage. However what happens when attackers strike again? Will the companies be prepared to fend it off the next time, or will be they be seen as an easy target because they gave in?

One thing is for sure; just as we continue to make strides in the cyber security industry, criminals continue to get more and more advanced with their own cyber attack tactics.

Training Machine Learning for Cyber Threats

8/30/2019

peak_cyber_value_prop_08.26.19.pptx
File Size:	8089 kb
File Type:	pptx

An in-depth view

8/30/2019

Equifax Breach vs. Synthetic Data

8/9/2019

With the recent Equifax breach coming back into the limelight due to the cancellation of the $125 check the FTC promised to those impacted by the breach, we want to take a look at possible prevention for the breach in the first place, or at least ways that the damage could have been minimized.

Distilinfo Hitrust Advisory notes that "Production data use in the test environment cited as watchdog levies maximum possible Pre-GDPR fine. This breach could have been easily avoided through the use of Synthetic Data in Test Environments, a recognized best practice." In other words, had Equifax used synthetic data while testing their security, or at least utilized it in some way, shape or form, the company wouldn't have been hit as hard if at all and therefore not suffer as direly as it has. A full link to the Distilinfo's article can be found here.

So what does that tell us? What other companies must be hacked, breached, or mishandle their data to realize the leaking of private information is a real threat to both a company's self and their clients? Not only does client data get released, but internal threat detection and information protection are important to have as well. Synthetic data being generated in test environments to make sure at the very least no real data would be exposed in the case of a breach would have been a good start. With technology continuously evolving, hackers can do more and more and get what's supposedly private data with just a few button clicks. Real names, emails, addresses, and dates of birth became public information and five data protection principles were violated; all of which could have been avoided had test data been employed.