The Data Blog
Jefferson Health recently reported their cloud-based database was hacked and data belonging to 1,769 patients treated at the Sidney Kimmel Cancer Center was compromised as a result. This attack occurred back in April of 2021, but was reported both publicly and to the federal government for the first time Thursday, July 22nd at the end of the 60-day legal window for reporting cyber attacks.
Cyber attacks in general have been on the rise ever since the beginning of the COVID-19 pandemic, however ransomware attacks and hackings against health facilities in the United States have soared to 153% from the year prior, and these are just those which have been reported.
Additionally, Jefferson Health was not the only healthcare facility breached by the data attack; reports suggest Yale New Haven Health System and many other affiliated Elekta healthcare organizations were breached, with intent seemingly related to stealing data related to Cancer patients.
With cyber attacks on the rise across all industries, especially healthcare, it's easy to tell that nobody is safe from malicious ransomware attacks. Companies worldwide are in constant demand for cybersecurity maintenance but it seems like the supply isn't getting any larger.
Synthetic data generation, however, offers an alternative solution, ensuring the safety of data belonging to clients while keeping the benefits of using real-world data. Now more than ever, synthetic data is imperative and serves as a great defense against hackers and cyberterrorists out to steal customer data.
Learn more at www.exactdata.net/
Earlier this week, a hacker gang behind an international crime spree claimed to have locked over a million individual mobile devices and made a demand of $70 million in bitcoin to unlock them. REvil, a Russia-connected cyberterrorist group has previously hacked JBS cyber operations and have since compromised Kaseya and Coop, two international giants, as well as have claimed to have attacked 1,000 individual small businesses as well.
Global ransomware attacks have been increasing steadily over the last few years, and while cyber defenses are continuing to improve, there's no telling who will be targeted next, and what it will cost your company if you are hacked. Money, assets, customers, and all kinds of personal data are at risk every day, and as the July 4th weekend proved, the threat is imminent.
Learn more at https://www.exactdata.net/
According to Erica Davis, Guy Carpenter, Managing Director and Cyber Center of Excellence Leader for North America, there will be $6T in 2021 global cybercrimes costs with only $6B in 2021 cyber insurance gross written premiums. The Ponemon Institute indicates 60% of cybercrime costs are due to 3rd party breaches. Fully synthetic data generation technologies eliminate the cost and risks of 3rd party breaches. The potential global financial impact is enormous with a potential reduction in cybercrime cost of $3.6T annually. The insurance industry would also be closing a huge risk exposure gap of trillions of dollars through broad adoption of synthetic data generation technologies.
Just recently, McDonalds has suffered from a data breach where the personal data of customers in Taiwon and South Korea was exposed. This comes right after JBS admitted to paying $11 million in ransom to Hackers who broke into their computer system last month.
With more and more companies are being targeted, it's hard to say who will be safe from looming threats.
ExactData is proud to announce our newfound partnership with Mammoth-AI! Mammoth-AI is an organization which seeks to help automate processes and services to the maximum while keeping costs low and ensuring the best of quality. Through automated, manual, performance, usability, and AI testing, Mammoth-AI ensures only the most optimal operations for your company, and will help you test and develop the software to get you there.
A link to Mammoth-AI's website can be found here
We're excited to collaborate with Mammoth-AI in the future, and to the automation they seek to put in motion all over the world!
Cybersecurity is one of the largest growing industries for all types of employment, including scam artists and hackers looking to make a few easy bucks. Due to the pandemic and its financial repercussions, it's more important than ever to make sure you keep yourself safe and avoid anything suspicious online.
CNBC warns that scammers are looking to target younger audiences with empty promises to forgive student loans and file taxes so that malicious software and patiently waiting hackers may steal PII (personally identifiable information), important documents, financial assets, credit card information, and more from right under their noses. This is of immediate concern especially during tax season and because stimulus checks are rolling out from the IRS, so it's important to keep your internet connection private and anti-malware software up to date.
This is not the first nor the last time scammers have tried taking advantage of a bad situation to make a quick profit which is why it's even more crucial we find new ways to combat malicious attacks coming from the cyber world.
As you may expect, the relationship between big data and the cloud is quite complex, but very efficient for all parties involved. Normally, big data can be limited by storage space, processing time, and cost. However, cloud computing can compensate for all of this; with a much larger amount of storage, faster processing, and cheaper cost, cloud computing is big data's best friend.
No longer do analysts and programmers need to run simulations and execute thousands of lines of code just to wait hours on hours to see a bug or two crashed their program meaning they'd have to restart the entire operation. By utilizing the cloud, big data is able to be run and processed in a fraction of the amount of time it used to take.
While it's possible to have one without the other, industry trends are pointing towards the relationship between cloud computing and big data being the next big boom; now that we have systems and services capable of analyzing all of this data, we can continue to improve the process.
So what does this mean for synthetic data? Larger and larger sets of synthetic data can also be used in combination with cloud computing for very similar results; machine learning to train and test models can be run with larger synthetic datasets allowing the job to be done with both more precision and speed. Artificial intelligence solutions can be vastly improved just by taking into account the relationship of cloud computing and large amounts of synthetic data, and we loo forward to the day that they are.
Check out the new ExactData Podcast page here! We're excited to launch our podcast and share our own tech updates, as well as give our opinions on current trends in the industry and the growing presence synthetic data will have in it!
Our podcast will also feature guests from Edgeworx to give their input as well! We're excited to partner with them for our Podcast and are looking forward to discussing intellectually stimulating topics with them.
Learn more at https://www.exactdata.net/
In data science, model training and fitting via machine learning is one of those subjects that never really has the same answer each time. Every model is different; each has their own data, response, and predictors uniquely akin to them. Yes there is a "right way" and a "wrong way" to train a fit a model, but what is right and wrong is very subjective. If the model works does that mean you trained it right every step of the way? If the model predictions are inconclusive or opposite of your hypothesis did you train it wrong?
The first step in proper model training is always asking yourself what the goal of the model is. What is its purpose? What are you trying to predict? If you can't specifically summarize your aim in a sentence or two you need to reevaluate your goals and the conceptual idealism behind the model. Every model should have a clear purpose that can be easily explained to anyone willing to listen.
"I'm testing to see if the amount of sleep one gets is a reliable predictor to whether or not they will drink coffee the next morning."
Great, right? Wrong. While the above description certainly seems valid, so many questions already arise from that one sentence. Does the person have to usually drink coffee to be counted as valid in the analysis? What do you mean by the 'amount of sleep', do you mean a little, a lot, just enough? When does the morning end? What defines "reliable" in this context?
To be honest, we can nitpick even the greatest models, but at the very least, a great model's problem, objective, solution, and features should be clearly identifiable when summarizing it.
"My model predicts whether or not the average caffeine consumer will drink coffee within four hours of waking up if they got less than 8 hours of sleep the previous night."
After being able to summarize your model and clearly laying out your intent, you'll need the right data to back it up. A few questions to keep in mind; how will you get the data? How do you know the data is accurate? Will you filter out outliers or take a random sample of the amount of observations? What if some of your data is incomplete or missing? Which fields are more important than others; which are predictors, which are responses, and which are unnecessary? Is there multicollinearity or any correlation at all for that matter within your data?
There are many questions that need to be addressed with the data, so many that they can't all be listed here. However, the most important part is that you're able to validate your data and the model that depends on it, so you can move forward with training your model.
We may train their model in a number of ways, one of the most popular being with a linear regression fit aimed to answer a specific question; what is the relationship between our predictors and outcome? While training our data we hope to isolate which of our predictors have a significant impact on our outcome and continue to use them, while taking out those which do not have an impact on our data in the long-run. By analyzing which variables have the most impact on our our response as a whole, we're able to enact the process known as regularization. Regularization is important, as it will determine your model's capabilities on data other than your training set. If your model doesn't work as well on other types of data meant for it, your model may be under or overfit, and you'll have to back up a bit in terms of figuring out your best predictors.
Thinking back to our coffee example, our response variable is obviously whether or not a consumer drank coffee. The most obvious predictor would include amount of sleep the previous night, but should include other predictors as well such as age, sex, weight, and accessibility. We'd then aim to trim any variable deemed not a valid predictor for drinking coffee the next morning. Once we believe we have the best predictors for our model, we'd test it on other datasets and continue training it until we're satisfied.
Learn more at www.exactdata.net/
A new year means new predictions for data trends and how they will affect the world of technology! Several trends are expected such as an increase in cloud computing, a large migration of systems to current databases to cloud software, and data privacy and social media data harvesting continuing to be in the spotlight of many.
Thus, to get the jump on others, it may be in your best interest to act quickly to migrate systems or get the next generation for your data needs. Whether it's for testing purposes, storage, or analytics, the future is tomorrow and tomorrow will come faster than you think.
We recommend researching upcoming data driven techniques that fit your need and capabilities and comparing them to your current processes right now. Do the upcoming or freshly introduced technologies look better than what you currently have? If so, you may have to act quickly before competitors jump on board and are the first to invest. So where can you start looking for these up and coming data driven technologies? Well, you've come to the right place.
Learn more at https://www.exactdata.net/
Throughout the last few years, cybersecurity and cybersecurity strategies have drastically altered to combat data breaches and hackers trying to access private information, but did you know that one way it evolved was simply due to the overwhelming amount of information posted online by regular internet users?
Enter misinformation and disinformation; two tactics that are now employed very easily thanks to the plethora of "fake news" and faulty tabloid headlines that are written as clickbait to attract the attention of social media users and website browsers. With an abundance of all of this information on the internet and there not being any signs of incorrect information slowing down, we've entered a new age of fighting cyberattacks; by overloading wrong information.
Misinformation and disinformation, while similar, do have one key difference; misinformation is the accidental or unknowing spread of incorrect information no matter how 'almost factual' or beyond the truth the content is. The important part here is that misinformation is spread without proper intent to do so; users who share content with incorrect data or information are finding themselves misinforming the general public, or those who read their social media posts at least, which leads to the misinformation cause.
Disinformation however, is the spread of incorrect information and data with intent to do just that; lie or upload false statements for any means necessary. Whether it's for political intent, cybersecurity strategy, or because someone just wanted to lie over the internet, the act is classified as disinformation, something that has become very popular over the last few centuries through different means such as espionage and propaganda.
Disinformation campaigns have been around just as long as misinformation campaigns have been, the only difference being intent, but nevertheless both are methods that are being picked up as a cybersecurity strategy and defense mechanism to mitigate people from finding out the truth. Whether the campaign seeks to inflate profits, deflate statistics, or just simply cover up a piece of information, it's easy to say that these strategies have become modernized in the world of technology.