The Data Blog |
According to White House Health Advisor Dr. Anthony Fauci, contact tracing isn't going as well as it first was hoped to. Despite contact tracing technology being able to successfully track where those infected with the virus a may have caught it from, the technology is being under utilized in populous cities such as San Francisco and New York. Furthermore, because the technology isn't being used to its full potential it cannot catch up to the amount of cases we're seeing today.
While many contact tracing applications and technologies are easily available, privacy is one of the main concerns regarding a potential solution to fighting off the COVID-19 pandemic. So, the question becomes is there a way to minimize security risk and breach of privacy while using contact tracing technologies to fight off COVID-19? The answer isn't clear right now, but perhaps a combination of synthetic data and contact tracing could lead us in the right direction. By altering more sensitive information and creating synthetic data based on real contact tracing technology findings, we may have a shot to reach a happy medium between contact tracing application usage and letting users enjoy their privacy.
1 Comment
Every company or government agency has had some sort of data breach at some point in time. They might not even know that the breach has happened. An interesting new strategy gaining interest within the cyber security community is the use of offensive misinformation campaigns.
Misinformation campaigns involve generating synthetic databases that would be indistinguishable from the production databases and having them passed to adversaries either through a honeypot deception solution or directly placed on dark websites dealing with selling stolen data. The result is that the adversaries will uselessly expend resources trying to sort out what is real and not, place doubt on any real information they might already have, and run illicit fraud campaigns against people who do not exist. For example, the Boeing aircraft manufacturing company would leak synthetic highly confidential wing design databases that would be indistinguishable from the real ones without extensive analysis or access to other information for verification. Other examples would be Equifax leaking bogus credit reports or VISA fake personnel financial information. The confusion and harmful effects on the adversarial community would be tremendous. Learn more at www.exactdata.net When thinking about the uses of synthetic data with financial institutions, banking applications and fraud detection are what commonly come to mind, but synthetic data can be used in junction with credit cards as well.
Credit card data is useful for predictive analytics to determine potential future purchases so that promotions and other marketing efforts for credit card users are more effective. Additionally, credit card data can be used to help track fraud due to purchase history outlining consumer tastes and behaviors, allowing banks to detect fraudulent purchases or false positives within fraudulent activity. So how does synthetic data tie in to this? Artificially generated data can be used to speed up the training of the machine learning and algorithm testing of credit card fraud detection software and predictive analytics, not to mention giving the applications a larger pool of data to test with to ensure as little error as possible. Demographics and private financial information of consumers are also safe when using synthetic data while the software runs just the same as if their data was being used, making it a win-win for all parties involved! Recently, TAGCyber conducted a research survey to analyze how effective synthetic data is viewed in regard to cyber security product testing. Shown below is a description of the research survey and an outline of the scoring participants would use during the study.
A dozen enterprise security practitioners were solicited recently to determine the value on a scale of 1 through 5 (not valuable to highly valuable) of using synthetic data for cyber security product testing. The results averaged 3.85 which corresponded to a largely favorable view of using synthetic data for cyber security product testing. The values were as follows: Synthetic data need not be used (score = 1), synthetic data should be used, but not required (score = 2), synthetic data is appropriate and should be used (score = 3), synthetic data should be encouraged and is valuable (score = 4), and synthetic data is a valuable and required element of our program (score = 5). Participants were encouraged to answer in a manner that integrated their personal and organizational views. According to this research survey, the use of synthetic data is becoming encouraged more and more in our daily lives. There are many reasons why synthetic data is being viewed as highly beneficial in this day and age such as its value when it comes to security breaches or its flexible nature regarding production and life-cycle testing. Listed below are a few quotes that reflect the conclusions drawn about synthetic data from the research survey. “I’d say there is actually an emerging market need for labeled, domain-specific datasets. It’s far easier to concoct them algorithmically.” “I think this is at least a 4. We are a software company, not a service provider so we don't want to touch or see actual customer data as it would require us to be governed by privacy laws such as GDPR and CCPA.” “I think I would score this around 3.75 (4 if you need a round number). You’ll likely have lots of POC’s, and it is definitely good to not use live data for these if possible.” Based on the results of this research survey, we see great promise for both synthetic data and the value it brings us when integrated with future technologies. Artificial Intelligence has come a long way since the initial applications the technology was developed for, and advancements in the field yet again show another added benefit in the form of detecting COVID-19. Artificial Intelligence applications have been used to detect COVID-19 in patients and distinguish the virus from other diseases such as phenomena and other lung diseases. According to Jun Xia from Shenzhen Second People's Hospital's Department of Radiology, a learning model can be used so AI can accurately differentiate COVID-19 from different types of lung disease, while detecting whether or not one is positive for the virus as well.
Furthermore, the use of AI is used within contact tracing models to detect where the virus is spread and how severe it is in different given areas. Applications can currently track where the virus is and algorithms are being tested to not only track the virus but also predict where it will spread as well. This not only gives us a better fighting chance against COVID-19 but future pandemics as well. Finally, AI and machine learning are being used to accelerate medical drug treatment for COVID-19 to identify potential medications to help act as a treatment for the virus. BenevolentAI used machine learning techniques for this purpose to deduce that Baricitinib, a drug for rheumatoid arthritis, is a strong candidate to inhibit the progression of COVID-19 and is now in clinical phases to act as a treatment as a result. AI and machine learning have come so far in such a short amount of time, and as a result are helping us deal with real-world problems in more ways than one. There are many ways synthetic data can be used to help grow, strengthen, and rejuvenate your organization and many processes it handles, but here are five key ways in which synthetic data will be able to directly help you and your company!
1) Synthetic Data has a wide variety of use cases to help you out with. Synthetic Data is artificially generated and thus can be manipulated for production testing and model fitting in a plethora of ways. It can be used for machine learning, mathematical model fitting, model testing, and more! 2) Synthetic Data adds an extra layer of security to your data; because synthetic data is artificially generated, if there is a data leak, hack, or if something ends up going wrong, there will be minimal security risk and harm as the exposed data will not put any individual's private information in danger of being exploited. This factor is huge within the cybersecurity world and adds as an extra precaution just in case there is a breach in the system. 3) Synthetic Data is cost-effective. Synthetic Data is less expensive to generate than it is to buy real data in terms of both time and money. Furthermore, because you may need different types of data for different types of test, you'll need several different types of data to test with; this begs the question, wouldn't it be easier to generate each type on the fly as needed rather than stat testing, realize you need to collect more samples and pause testing until you have collected enough to continue? 4) Synthetic Data is great when it comes to threat detection. Synthetic data can reflect authentic patterns and behaviors for insider threat detection and user behavior in the models it is used to create. Furthermore, it can be used during performance testing to cover a variety of different scenarios which can lead to increased threat detection and strengthen an application or model's defensive capabilities. 5) Synthetic Data strengthens performance more than authentic data can. Synthetic data can be used to test models with quickly and efficiently so that data can be analyzed right after the data is plugged in. Moreover, it can be used to train models in ways models can't be trained when using authentic data; it can be generated to fill in for any missing data or used to predict different types of behavior based on reasonable machine learning, rather than leaving data empty or assuming what 'would' have been answered. Enterprise Implementation Best Practices: Behavioral Threat Detection for Sexual Harassment
We have discussed that with technology currently available, you can combine commercial network traffic and synthetic data generation technologies to provide rich content that mirrors real-world network traffic with configurable threat patterns contained within the traffic data. Imagine if you were responsible for implementing a solution for detecting and preventing sexual harassment within your system’s network. Would it not make sense to procure this solution in a fashion where vendors could be quantifiably evaluated based on your actual network and sexual harassment criteria? And the awarded contract would include these same metrics as Service Level Agreement (SLA) criteria so that you would know the solution was implemented and operating over time correctly? For those of you operating on the buy side of the Enterprise consider implementation best practices where you are not only trusting what the vendor is telling you the system is doing, but also verifying and holding the vendor to its commitments. Learn more at www.exactdata.net Enjoy the TAG Cyber interview below between ExactData's John Dawson and TAG Cyber's Ed Amoroso where John discusses the concept of Synthetic Data and its real-world application use cases! The topics of supervised and unsupervised machine learning are up and coming in today's age, and both are essential to understand for those of us invested in the data analytics world. Below are two quick definitions for the differing types of machine learning.
Supervised Machine Learning is the process of learning the relationships between input data based on pre-existing knowledge, descriptors, and models to classify future unknown data in a more accurate way. Unsupervised Machine Learning is the process of conceptualizing relationships and input data on the fly with the intent to understand, infer, and predict a balanced structure within a set of current or future data. While both tactics for machine learning have their advantages and disadvantages, supervised machine learning tends to be utilized more frequently do to having an overall better comparative performance. Supervised machine learning is used throughout many fields of data analytics, a couple of examples being text analysis, sentiment analysis, clustering, risk analysis, and much more! While supervised machine learning has many benefits, it has a few shortcomings as well, one of them being a reliance on labeled, network data for testing purposes. Fortunately, ExactData combines our synthetic data with Ixia's network traffic generator to counteract these shortcomings and test both frequently and rigorously to ensure the proper training of data models using supervised machine learning capabilities. More on this subject can be found here in our Supervised Machine Learning white paper! |
Archives
August 2023
Categories
All
Data Blog |