What can “third-generation artificial intelligence” help us do? RealAI, RealAI, gave an answer in two years.
“The first generation of knowledge-driven AI uses three elements of knowledge, algorithms and computing power to construct AI; the second generation of data-driven AI uses three elements of data, algorithms and computing power to construct AI. One side simulates human intelligent behavior, so it has its own limitations, and it is impossible to touch the real intelligence of human beings.” Zhang cymbals, dean of the Institute of Artificial Intelligence of Tsinghua University and academician of the Chinese Academy of Sciences, and others in a special issue article in September this year wrote so.
In this article, they also comprehensively expounded the concept of the third generation of artificial intelligence for the first time, and proposed that the development path of the third generation of artificial intelligence is to integrate the knowledge-driven artificial intelligence of the first generation and the data-driven artificial intelligence of the second generation. , data, algorithms and computing power, establish new interpretable and robust AI theories and methods, and develop safe, credible, reliable and scalable AI technology (for details, please refer to: “Tsinghua Academician Zhang Yu Special Issue” Article: Towards the third generation of artificial intelligence (full text included)”).
In fact, as early as 2016, Professor Zhang cymbal proposed the concept of developing “third-generation artificial intelligence”. In his view, although the current AI has made some progress, it still faces problems such as poor robustness and opaque decision-making. It is necessary to combine knowledge-driven and data-driven to solve problems.
In 2018, RealAI, with Zhang cymbal and Zhu Jun (director of the Basic Theory Research Center of the Institute of Artificial Intelligence, Tsinghua University) as chief scientists, was established as an industry-university-research technology company from the Institute of Artificial Intelligence of Tsinghua University. The company’s vision is: relying on the third-generation artificial intelligence technology to overcome many shortcomings of general deep learning, and fundamentally enhance the reliability, credibility and security of artificial intelligence.
Today, two years later, at the “2020 Third Generation Artificial Intelligence Industry Forum and RealAI Strategy Conference of RealAI”, RealAI showed the outside world for the first time their AI-native infrastructure products based on third-generation artificial intelligence technology. blueprint.
At the forum, Tian Tian, CEO of Relais Intelligence, pointed out that the current focus of AI infrastructure construction is on data and computing power platforms, which mainly provide basic computing conditions and productivity for AI. For example, it is equivalent to solving the problem of food and clothing for AI. The rapid growth of data and computing power as an “external driving force” has indeed driven the rapid development of AI technology in fields including face recognition and speech recognition, driving the emergence of the “first growth curve” of the AI industry.
However, with the emergence of problems such as data complexity, privacy protection restrictions, and slow growth in computing power, the first growth curve of the AI industry has begun to slow down. In this scenario, we urgently need to open up new dimensions other than “data” and “computing power” for the AI industry, and develop AI “endogenous driving force” starting from enhancing the underlying capabilities of algorithms.
However, in order to strengthen the endogenous driving force of AI, there are several “levels” that we need to break through, including more secure and reliable decision-making (AI decision-making logic and links are unclear and vulnerable), data privacy and security (information leakage) , data silos) and the management and control of AI application scenarios (algorithmic fairness, social ethics).
“As the builders of the AI industry, we look at this issue from the perspective of infrastructure. In addition to the data platform and computing power platform inherited from the Internet era, we need to build AI-native infrastructure and provide the necessary infrastructure based on the capabilities of AI technology. Guaranteed,” Tian Tian said.
After two years of hard work, Tian Tian and others have given a blueprint for this infrastructure.
In terms of algorithm reliability, they developed RealBox, an interpretable AI modeling platform based on Bayesian deep learning technology. The platform was officially released in 2019 and has been actually used in many financial institutions, and has passed the first batch of trusted AI certifications from the China Artificial Intelligence Industry Development Alliance. In terms of application controllability, they launched the DeepReal deep forgery detection tool, which can efficiently and accurately determine whether videos, pictures and other content materials are generated by AI forgery, so as to avoid corresponding public opinion. DeepReal was selected as an excellent product of artificial intelligence by the National Industry and Information Security Center. , Based on its core technology, RealAI also won the GeekPwn 2020 Deep Forgery Detection Project Champion.
In addition, two new products were released at the event.
One of them is RealSecure for data security. It is the industry’s first compiler-level privacy-preserving machine learning platform. Its core module “Privacy-Protecting AI Compiler” can automatically convert ordinary machine learning algorithm programs into distributed, privacy-safe programs. , which greatly reduces the threshold for privacy-safe AI commercial applications.
The other is RealSafe 2.0, which is released for reliable algorithms. It is an upgraded version of RealSafe, the world’s first enterprise-level AI security platform, which is equivalent to anti-virus software and firewall for AI models. The upgraded RealSafe provides security attack and defense capabilities for algorithms such as target recognition, and adds functions such as backdoor vulnerability detection.
Tian Tian said, “This series of AI-native infrastructure can open up a new dimension of AI capabilities, stimulate the second growth curve of AI, and bring new market opportunities for AI to empower all walks of life.”
RealSecure, the industry’s first compile-level privacy-preserving machine learning platform, unveiled
In the development process of AI, data is the basic productivity, which is used to solve the “subsistence problem” of AI. However, because the data itself is difficult to obtain and process, and it also involves issues such as industry secrets and user privacy, many data owners are unwilling or unable to upload the data to a data center for model training, thus forming “chimneys” or “islands” one by one. “.
In response to this problem, distributed privacy-preserving machine learning is an emerging solution that enables multiple parties to cooperate to complete learning goals, but avoids the transmission of original data by all parties. This is not a new concept, and similar terms include Google The “federated learning” proposed by other institutions realizes “the data does not go out, it is available and invisible”, but it faces three major pain points in practical commercial applications:
The first is the most important problem of privacy-preserving machine learning recognized by academia and industry—poor performance. Privacy-preserving machine learning requires the cooperation of multiple parties, and multiple parties need to complete the parameter exchange in an encrypted form. Encryption brings about a hundred times the performance loss. At the same time, there is a gap between the hyperparameter settings and the existing machine learning ecosystem, and the speed of privacy-preserving machine learning is nearly a thousand times slower. Local model training takes tens of seconds, but it takes hours under privacy protection. However, feature screening, model parameter adjustment, and model verification require dozens or hundreds of repeated modeling processes. In order to achieve data security, the modeling speed is greatly sacrificed.
The second is an important reason why privacy-preserving machine learning is difficult to commercialize on a large scale – it is difficult to be compatible with the existing machine learning ecosystem. Different from traditional machine learning, privacy-preserving machine learning is a combination of distributed systems, cryptography, and artificial intelligence. In order to achieve the goal of privacy protection, organizations should organize teams to learn distributed systems, learn cryptography; learn to use new algorithms, new frameworks, and execute programs on new platforms. This also means that the experience and methodology accumulated by the AI team for a long time cannot be directly applied in the field of privacy-preserving machine learning, and the investment and cost of reconstruction or rewriting are very high.
The third core issue of privacy protection is to ensure the security of data assets, which also means that the security of the platform itself should be verifiable.The existing models are all pure black box operation, and the security verification completely relies on the endorsement of experts, but the entire platform has a huge amount of code, and the expert line-by-line audit mode is difficult to achieve. Moreover, in the actual production environment, it is difficult to guarantee whether it is really executed according to the code logic provided during the audit.
The privacy protection machine learning platform RealSecure was born under the above requirements.It is the first to reveal the relationship between machine learning algorithms and the corresponding distributed privacy protection machine learning algorithms from the perspective of the underlying data flow graph. Through the combination of operators, the machine learning ecology and the privacy protection machine learning ecology are connected together, so as to solve the problems faced by enterprises in building a privacy protection ecology. There are many problems such as poor performance, poor usability, black box protocol, etc., to achieve the integration of the two ecosystems.
Thanks to the capabilities of the underlying compilation level, RSC hasThree advantages:
Strong performance.With the help of cryptography optimization, AI algorithm optimization and other improvements, the performance of model training is about 40 times higher than that of a mainstream domestic open source framework (the latest version), and the time-consuming is reduced from 4 hours and 40 minutes to 6 minutes. Taking into account the feature engineering and automatic parameter tuning environment, the whole process modeling is completed under privacy protection, and the total time consumption has achieved a leap from the daily level to the hour level.
Sensorless application.To achieve the “single line” of machine learning ecology and privacy protection machine learning ecology, only a few changes are required, and the unification of the machine learning algorithm platform framework and the privacy protection machine learning platform framework can be completed through automatic conversion. Data scientists can model the same as machine learning. The way to use privacy-preserving machine learning, the ease of use is greatly improved.
Safe and transparent. A true privacy-preserving learning application should be white-box verifiable, and all underlying computations should be auditable, so as to ensure the security of the privacy-preserving learning platform. RealSecure exposes the intermediate computing process in the form of a data flow graph, realizing the security and transparency of the computing process.
RealAI said that the disruptive improvement in ease of use and performance has also made RealSecure an “enterprise-grade” privacy-preserving machine learning platform that is faster and easier to apply to business environments.
At the press conference, Tian Tian also clarified their concept of developing these two products: “When we encounter technical difficulties in the application, we don’t see one solve one and fix the other; we find one problem and see another. The two new products we focus on today are typical representatives of this concept, and their product positioning, functions and value are unique, and RealAI pioneered them. “
RealSafe, the world’s first enterprise-level AI security platform, ushered in 2.0
In the era of network security, the large-scale penetration of network attacks has spawned a large number of antivirus software. But as AI gradually becomes part of the infrastructure, “anti-virus software” for AI models has been absent.
There is a huge security risk in this absence. Data shows that more than 40% of mobile phones have been equipped with face recognition solutions last year, but some of them can be easily unlocked through a pair of glasses printed with special texture patterns.
If the mobile phone only affects privacy and property security, then the security breach of the self-driving system is a real deadly threat. Roland Berger, an international management consulting firm, predicts that the global market for autonomous vehicle end systems is expected to exceed $100 billion in 2020. But at the same time, hackers only need to add a specific pattern to the sign to make the machine recognize the speed limit sign as a stop sign, resulting in a fatal accident.
To solve these problems completely, we need to understand how AI algorithms such as deep neural networks learn and work, but until today, we still know very little about it. Therefore, we may need to change our minds.
In this regard, the solution given by RealAI is: model security detection + defense. That is to say, we can first detect the security risk category and level of the model through various attack methods, and then provide various solutions to improve the security of the model. This is the world’s first enterprise-grade AI security platform, RealSafe, which they launched earlier this year.
In the security detection stage, RealSafe will use a variety of attack algorithms to generate adversarial samples with different iterations and disturbance sizes to simulate attacks, try to make errors, then count the probability and distribution of errors, and output detection reports. The platform is equivalent to an “antivirus software”. The entire detection process is fully interfaced, and users do not need to have professional knowledge of model security algorithms and programming development experience.
In the defense phase, RealSafe supports a variety of general defense methods for removing adversarial noise, which can realize automatic denoising processing of input data and destroy adversarial noise maliciously added by attackers. At the same time, RealSafe also supports detecting whether the input data contains adversarial samples. This defense method builds a “firewall” between the model and the input data, keeping data with attack intentions out of the model.
In April of this year, RealAI released RealSafe 1.0, which can be used to improve the security of face recognition models and improve the ability to deal with adversarial sample attacks (such as the glasses with special texture patterns printed above). Today, a few months later, RealSafe achieved rapid iteration, and version 2.0 was officially launched. Compared with RealSafe 1.0, the 2.0 version has expanded in terms of the attack type and scope of supported defense.
First of all, based on the security of detecting anti-adversarial sample attacks, the new version adds automatic detection of “model backdoor attack”, which can search and restore the final result of the backdoor trigger for each category of the model, and discriminate according to the degree of dispersion of the restored results. Whether the model is backdoored. In addition, the test report can also show the type of backdoor the model is implanted in and the corresponding backdoor area.
“Model backdoor attack” is an emerging attack method against machine learning models. The attacker will bury the backdoor in the model, so that the infected model behaves normally under normal circumstances. But when the backdoor trigger is activated, the output of the model will become a malicious target pre-set by the attacker. Since the model behaves normally until the backdoor is not triggered, such malicious attacks are difficult to detect. Although this attack method is not very common in practical scenarios at present, it is still of great significance to strengthen the algorithm against possible attacks. This also shows RealAI’s forward-looking product layout.
Second, the 2.0 version of RealSafe extends the scope of application to underlying AI models such as target detection and image classification. Typical application scenarios of the former include human detection, vehicle and drone detection in security scenarios, and human detection and vehicle detection in autonomous driving scenarios. Typical application scenarios of the latter include the identification of pornographic, violent, and infringing elements in social networks and short video applications, and the automatic classification of mobile phone photo albums. These scenarios are also the areas where AI models are most widely used and have the most urgent security requirements.
After security detection, the RealSafe platform also provides a variety of functions to help users improve the security of AI models. Taking adversarial sample denoising as an example, the platform will automatically quantify the effect of various general adversarial sample denoising schemes on model security for the tested model, so that users can choose the most suitable defense solution for the currently tested model.
RealSafe integrates a number of internationally leading AI countermeasures against offense and defense algorithms, and has won several world AI security competition championships. Compared with some existing artificial intelligence adversarial technology toolkits in the industry, RealSafe also has the advantages of supporting adversarial sample attack and defense based on generative models, supporting black box detection, and easy to use zero code.
At present, RealSafe has been applied in major construction projects of the Ministry of Industry and Information Technology and a power grid company.
RealAI said that in the future, RealSafe will also provide solutions to deal with new AI security risks such as model theft and data reverse restoration.
The underlying technology and application scenarios
To build a third-generation AI-based infrastructure, RealAI applies a number of technologies, including:
Bayesian deep learning: organically combine the advantages of deep learning and Bayesian methods, take into account the natural uncertainty in data and prediction results, improve the generalization ability of AI models, and achieve reliable and interpretable AI;
Interpretable machine learning: Ensure that in the modeling process, explanations are given from different dimensions such as key features and decision-related basis to improve people’s understanding of AI results;
AI security against attack and defense: discover the mechanism of AI algorithm vulnerabilities through confrontation, and guide the development of robust AI algorithms and systems through confrontation defense technology;
A new generation of knowledge graph: Introduce domain knowledge into AI modeling to realize the common drive of knowledge and data;
Privacy-preserving machine learning: Solve the problem of data circulation in AI scenarios, and support the training and prediction of AI models by combining with cryptography and distributed systems when plaintext data is not stored in the database; amount and use, control ownership and revenue.
The use of these technologies greatly improves the usability of AI technology in real-world scenarios. RealAI’s AI infrastructure products have been used to solve problems such as data bias in financial risk control scenarios, low asset allocation efficiency, and data missing in infrastructure scenarios.
In order to continue to make further efforts from the three aspects of research, platform, and industrial empowerment to accelerate the intelligent upgrade of safe, credible, and reliable industries, RealAI has also jointly established a security artificial intelligence innovation center with Beijing Zhiyuan Artificial Intelligence Research Institute to help artificial intelligence. Responsible development of the industry.
The construction of AI infrastructure is a long-term task, and Tian Tian said that RealAI will “adhere to long-termism and promote AI to serve human society with higher quality.”