Abstract

AISys-IQ is a standardized benchmarking specification and methodology designed for evaluating the IQ of intelligent systems. This framework comprises two tiers of IQ evaluation: one for intelligent algorithms and the other for intelligent systems. Within both algorithmic and systemic IQ assessments, the focus lies on three categories of intelligence: single-task intelligence, industrial intelligence, and general intelligence. This approach caters to various benchmarking needs, ensuring thoroughness, diversity, and scalability. Single-task intelligence pertains to AI algorithms and systems specialized in executing singular tasks like image classification or object detection. Industrial intelligence refers to AI algorithms and systems tailored for intricate industrial contexts or entire life-cycle applications characterized by long workflows and multiple components, such as search engines or social networks. General intelligence encompasses AI algorithms and systems capable of addressing a broad spectrum of tasks with versatility, like the functionality exhibited by ChatGPT.

Specification

AISys-IQ Standard Specification: Version 1.0

AISys-IQ Evaluatology


Overview

The AISys-IQ Evaluatology framework presents a comprehensive approach to assessing the intelligence quotient (IQ) of both intelligent algorithms and intelligent systems. It is structured around the evaluation requirements of stakeholders, focusing on two primary areas: algorithmic IQ assessment and system IQ assessment. The framework emphasizes the necessity to accommodate various scenarios, from simple to complex, and to consider both single-task, multi-task with intricate interactions, and general scenarios.

Stakeholders' Evaluation Requirements

At the top of the framework, the stakeholders' evaluation requirements are highlighted, indicating that the entire evaluatology process is driven by the needs and expectations of relevant parties. These requirements guide the development of evaluation conditions tailored to assess intelligent algorithms and systems effectively.

Algorithm IQ Evaluatology

The Algorithm IQ Evaluatology component focuses specifically on evaluating the intelligence of algorithms. This evaluation is structured through several key components:
Algorithm-oriented Problems or Tasks: Identifying specific problems that the algorithms are designed to solve.
Problem or Task Instances (Test Data): Utilizing various instances of problems to evaluate algorithm performance, ensuring robustness and adaptability.
Kernels or Building Blocks: Analyzing the fundamental components that make up the algorithms, which can influence their overall performance.
Instantiations of Kernels and Building Blocks: Evaluating how different configurations of these components impact algorithm effectiveness.
Support Systems for Algorithm IQ Assessment: Establishing auxiliary systems that assist in the evaluation process, providing necessary tools and methodologies.
The evaluation metrics for algorithms consider average-case performance, ensuring a holistic view of their effectiveness across various scenarios.

System IQ Evaluatology

System IQ Evaluatology addresses the intelligence assessment of entire systems. This evaluation follows a parallel structure:
System-oriented Problems or Tasks: Defining the specific problems the systems are designed to tackle.
Problem or Task Instances: Similar to algorithms, various instances are used to assess system performance under different conditions—this includes simple tasks and more complex OODA-like (Observe, Orient, Decide, Act) scenarios.
Algorithms or Algorithm-like Mechanisms: Focusing on the algorithms that drive the systems and their effectiveness in achieving the desired outcomes.
Support Systems for System IQ Assessment: Implementing systems that aid in the evaluation of intelligent systems, providing necessary frameworks and tools.
Variables and System Parameters: Considering the different variables that can affect system performance, ensuring a comprehensive assessment.
The evaluation metrics for systems also focus on worst-case scenarios, providing insights into their robustness and reliability under challenging conditions.