The Data Blog
According to White House Health Advisor Dr. Anthony Fauci, contact tracing isn't going as well as it first was hoped to. Despite contact tracing technology being able to successfully track where those infected with the virus a may have caught it from, the technology is being under utilized in populous cities such as San Francisco and New York. Furthermore, because the technology isn't being used to its full potential it cannot catch up to the amount of cases we're seeing today.
While many contact tracing applications and technologies are easily available, privacy is one of the main concerns regarding a potential solution to fighting off the COVID-19 pandemic. So, the question becomes is there a way to minimize security risk and breach of privacy while using contact tracing technologies to fight off COVID-19? The answer isn't clear right now, but perhaps a combination of synthetic data and contact tracing could lead us in the right direction. By altering more sensitive information and creating synthetic data based on real contact tracing technology findings, we may have a shot to reach a happy medium between contact tracing application usage and letting users enjoy their privacy.
Every company or government agency has had some sort of data breach at some point in time. They might not even know that the breach has happened. An interesting new strategy gaining interest within the cyber security community is the use of offensive misinformation campaigns.
Misinformation campaigns involve generating synthetic databases that would be indistinguishable from the production databases and having them passed to adversaries either through a honeypot deception solution or directly placed on dark websites dealing with selling stolen data. The result is that the adversaries will uselessly expend resources trying to sort out what is real and not, place doubt on any real information they might already have, and run illicit fraud campaigns against people who do not exist.
For example, the Boeing aircraft manufacturing company would leak synthetic highly confidential wing design databases that would be indistinguishable from the real ones without extensive analysis or access to other information for verification. Other examples would be Equifax leaking bogus credit reports or VISA fake personnel financial information. The confusion and harmful effects on the adversarial community would be tremendous.
Learn more at www.exactdata.net
When thinking about the uses of synthetic data with financial institutions, banking applications and fraud detection are what commonly come to mind, but synthetic data can be used in junction with credit cards as well.
Credit card data is useful for predictive analytics to determine potential future purchases so that promotions and other marketing efforts for credit card users are more effective. Additionally, credit card data can be used to help track fraud due to purchase history outlining consumer tastes and behaviors, allowing banks to detect fraudulent purchases or false positives within fraudulent activity.
So how does synthetic data tie in to this? Artificially generated data can be used to speed up the training of the machine learning and algorithm testing of credit card fraud detection software and predictive analytics, not to mention giving the applications a larger pool of data to test with to ensure as little error as possible. Demographics and private financial information of consumers are also safe when using synthetic data while the software runs just the same as if their data was being used, making it a win-win for all parties involved!
Recently, TAGCyber conducted a research survey to analyze how effective synthetic data is viewed in regard to cyber security product testing. Shown below is a description of the research survey and an outline of the scoring participants would use during the study.
A dozen enterprise security practitioners were solicited recently to determine the value on a
scale of 1 through 5 (not valuable to highly valuable) of using synthetic data for cyber security
product testing. The results averaged 3.85 which corresponded to a largely favorable view of
using synthetic data for cyber security product testing.
The values were as follows: Synthetic data need not be used (score = 1), synthetic data should
be used, but not required (score = 2), synthetic data is appropriate and should be used (score =
3), synthetic data should be encouraged and is valuable (score = 4), and synthetic data is a
valuable and required element of our program (score = 5). Participants were encouraged to
answer in a manner that integrated their personal and organizational views.
According to this research survey, the use of synthetic data is becoming encouraged more and more in our daily lives. There are many reasons why synthetic data is being viewed as highly beneficial in this day and age such as its value when it comes to security breaches or its flexible nature regarding production and life-cycle testing. Listed below are a few quotes that reflect the conclusions drawn about synthetic data from the research survey.
“I’d say there is actually an emerging market need for labeled, domain-specific datasets.
It’s far easier to concoct them algorithmically.”
“I think this is at least a 4. We are a software company, not a service provider so we
don't want to touch or see actual customer data as it would require us to be governed
by privacy laws such as GDPR and CCPA.”
“I think I would score this around 3.75 (4 if you need a round number). You’ll likely have
lots of POC’s, and it is definitely good to not use live data for these if possible.”
Based on the results of this research survey, we see great promise for both synthetic data and the value it brings us when integrated with future technologies.