Functional Trustworthiness

Nessler/Doms/Hochreiter
Functional trustworthiness of AI systems by statistically valid testing

arXiv:2310.02727

The authors are concerned about the safety, health, and rights of the European citizens due to inadequate measures and procedures required by the current draft of the EU Artificial Intelligence (AI) Act for the conformity assessment of AI systems. We observe that not only the current draft of the EU AI Act, but also the accompanying standardization efforts in CEN/CENELEC, have resorted to the position that real functional guarantees of AI systems supposedly would be unrealistic and too complex anyways. Yet enacting a conformity assessment procedure that creates the false illusion of trust in insufficiently assessed AI systems is at best naive and at worst grossly negligent. The EU AI Act thus misses the point of ensuring quality by functional trustworthiness and correctly attributing responsibilities.
The trustworthiness of an AI decision system lies first and foremost in the correct statistical testing on randomly selected samples and in the precision of the definition of the application domain, which enables drawing samples in the first place. We will subsequently call this testable quality functional trustworthiness. It includes a design, development, and deployment that enables correct statistical testing of all relevant functions.
We are firmly convinced and advocate that a reliable assessment of the statistical functional properties of an AI system has to be the indispensable, mandatory nucleus of the conformity assessment. In this paper, we describe the three necessary elements to establish a reliable functional trustworthiness, i.e., (1) the definition of the technical distribution of the application, (2) the risk-based minimum performance requirements, and (3) the statistically valid testing based on independent random samples.

Reference:

Nessler/Aufreiter/Gruber/Brune/Stadlbauer/Schweighofer/Schmid/Doms
Stochastic Application Domain Definition for Functional Trustworthiness Certification of AI Systems

in preparation

As artificial intelligence (AI) advances, ensuring the reliability and functional trustworthiness of AI applications has become increasingly important. The ongoing regulatory efforts such as the EU AI Act highlight the need for effective processes for testing and evaluating the performance of AI-Systems, particularly those based on machine learning (ML). We refer to these systems as ML Systems for the purpose of this paper. Current discussions on testing practices often focus on different metrics or test data governance methods. However, these approaches tend to overlook the challenges of clearly defining the intended use and the intended domain of uses cases for ML Systems.

This Application Domain (AD) is a crucial concept when testing ML-systems. Tests are typically based on expected values of specific metrics, such as the performance and risk of safety-relevant mistakes. The AD represents the probability distribution over which these expectations are calculated, making it essential for accurate testing results. The Stochastic Application Domain Definition (SADD) is a theoretical concept that we introduce as a reference point for valid statistical testing  procedures of ML systems. At the core of our proposed framework is a textual description of the SADD. The SADD enables the users of an ML system to verify if there use cases are covered within the limits of provider-guaranteed quality.

We will show that, despite the ambiguity of natural language, a textual description is the best possible description of the SADD for real-world ML systems in general. We will then provide a systematic guideline for constructing and interpreting a textual SADD. Finally, we will discuss the practical implications and the inevitable imperfections of realistic testset sample generation procedures.

Certification of AI

TRUSTIFAI is THE testing and qualification hub for AI applications with international orientation. It is a joint venture of TÜV AUSTRIA and SCCH, working in a joint faculty with the Institute for Machine Learning at JKU Linz to ensure the state of the art in AI safety.