Data is the basic information block that drives AI. This data can be stored on premise, in the cloud, or on a hybrid infrastructure that combines both options. How to choose where to keep your data? In what cases is a cloud necessary? In what extent can the cloud propel advanced AI? Can we trust the cloud?
It’s now possible to analyze data in real time and on the fly without even storing it in databases with tools such as stream analytics, and to securely send the data directly from the sensors to the cloud with platforms such as Azure Sphere, but it isn’t always necessary to push everything in the cloud, especially if you’re starting from scratch. “It’s more efficient to upload only the data necessary to solve the business issue”, according to Ziad Benslimane, Lead Data Engineer at Wizata. “While it’s possible to update the changed parts of databases with timestamps, it’s bad practice for databases such as definitions’ tables, where you’d have to keep uploading them just in case.” Choosing wisely with your cloud and data experts what you upload on the cloud also limits your data analytics cost (based on the number of columns and rows), as well as the cheaper storage and bandwidth costs.
In the manufacturing industry, where databases and IT systems are an integral part of production control, replicating the data onto the cloud allows you to manipulate it and experiment on it in a virtual laboratory, without risks to the production. It’s also a necessary condition to keep the mainframes and core systems in optimal condition, without sending them constant requests that may limit performance.
On the cloud, the elastic and powerful big data infrastructure gives you the raw power to generate the most impactful AI insights, that make sense of the largest number of features to give you the competitive edge. With less data, and less computing power than on the cloud, AI can still be viable but it’s less adapted for real time predictions and recommendations.
The raw data from the sensors can be refined into distinct levels of quality (the ‘data processing levels’), with level 0 being the basic, unsorted and unprocessed state. In theory, you would always want to use a polished higher-level dataset for AI. But it’s not that simple, and it really depends on your business goal: “It sometimes takes several minutes for the data to be transformed and finally available in a polished form”, continues Ziad Benslimane. “For certain quality issues, the product is already damaged when the processed information is available, which forces us to focus on the real-time raw data using the capacities of the cloud to act while there’s still time. On the other hand, for root cause analysis, it’s not important to get data of the last 5 minutes if you look at an entire year of production. You can use structured historical data that is easier to interpret. That’s why it’s important for data engineers to ask the right questions right from the beginning of the project: what the business goal is, where the data is, is it in real time, do we need real time data for the data science model, etc.”
The cloud certainly has many advantages in itself: with self-managed cloud infrastructure such as Microsoft Azure, it is swift, cost-efficient and effortless to set up the exact infrastructure you need in a few clicks, with the most up-to-date software and hardware. Instead of uselessly chasing world-class expensive IT profiles that research and implement the future of algorithms, you are free to focus on your core business, with a drastically reduced time-to-market of new ideas. Indeed, a highly competitive global market forces companies to react quickly and decidedly when facing new challenges. There’s usually not enough time to hire and train IT professionals when a new business model emerges, and it’s difficult to burn capital into buying servers and infrastructure that won’t generate revenue from the beginning.
Rather than trusting local legacy systems with potentially dangerous security defects, cloud providers ensure that the most secure technologies and best-in-class teams are securing the data. Security must be engineered from end-to-end, from the cloud to the desktop and the sensors, and only the most tech-savvy companies can secure the whole pipeline, with constant, proactive and reacting threat monitoring and analysis. Virtual and physical access protection must be at the highest standard, with encryption across the board.
With multiple regional clouds around the world, companies can now choose in what legislation their data is kept, for example in GDPR-compatible infrastructures located in Europe. On Azure, the data processed on this trustworthy cloud cannot be accessed by Microsoft employees and the customer owns and controls their data.
Among others, Microsoft has proved its added value, simply because it is a fully partner-oriented cloud, completely agnostic and unrestricted in terms of open source languages and technologies (you can run a Linux VM on Azure, etc.), extendable and able integrate natively with long time players such as SAP. You don’t have to choose a unique cloud provider: nowadays, interoperable technologies allow the different platforms to talk to each other. The main goal is to correctly determine the most urgent business issues that provide the best ROI, and only then define the technology to support your solution.