Accepted Papers

IEEE AITest 2026 received a record number of 110 submissions this year. Following a rigorous review and discussion process:

  • 19 papers were accepted as Regular Papers
  • 14 papers were accepted as Short Papers

This corresponds to an overall acceptance rate of approximately 30.0%, with a regular paper acceptance rate of approximately 17.3%.

Regular Papers

Koyama, R., Wang, Z., Karolita, D., Li, J., & Tei, K. Bridging the interpretation gap in accessibility testing: Empathetic and legal-aware bug report generation via large language models.

Wang, T., Yuan, F., Miao, C., Seshadri Kaniyanoor, M., & Matsumoto, K. AutoQABench: A three-level UX benchmark for automated evaluation of open-ended LLM responses.

Yamada, M., Kato, M., & Takahashi, J. Improving LLM-based unit test generation through root-cause-driven prompt design.

Marimuthu, G. Metamorphic testing of multi-agent LLM systems: A trace-based behavioral oracle framework.

Poola, R., Yakkali, L. V. S. G. V. G., & Dadi, E. R. N. ECG-based cardiac arrhythmia detection via adaptive continual active optimization with hybrid parallel ensemble architecture.

Trevino, E. Deterministic behavioral contract testing for AI features at the browser layer.

Nguyen, T. D., Oberreuter, G., Steckenbiller, S., Tkalcic, P., & Fraser, G. Evaluating DL test adequacy metrics: A correlation study with DeepCrime’s mutation score.

Twabi, A., Ding, Y., & Kondo, T. Formal trajectory analysis for testing agentic AI in stateful environments.

Lin, L.-J., & Hong, C.-D. Robustness verification of RNN with abstraction refinement.

Shahzad, N., & Wotawa, F. Ontology-based testing of reinforcement learning applications.

Lu, R. W., Chen, W. P., & Yu, F. Dual-track red teaming and evaluation for Traditional Chinese localized LLMs.

Nhan, T. P. T., & Shibata, T. Transformer-based temporal action localization of fine-grained operations in biological experiment videos and analysis of compositional limitations.

Zhang, L., Zhao, F., & Yang, X. A C-SSRS-based risk-aware evaluation framework for self-harm safety in LLMs.

Kuang, F., Liu, D., Zhao, M., & Mi, L. A dynamic evaluation approach to repository-level code generation via LLM.

Wang, Y., Chen, J., & Wang, Q. LLM-based detection and test generation for command injection vulnerabilities in Python.

Mahmud, A., Rawajfih, Y., & Arnold, R. Beyond bypass: Measuring vulnerability regression in LLM-generated security patches.

De Vita, G., Humbatova, N., & Tonella, P. Evaluating the efficacy and diversity of DNN test input prioritisation techniques.

Manope, F. I., Faqih, A. R., Fadhlurrohman, D. H., Husen, J. H., & Riskiana, R. R. LLM-based automatic metamorphic test case generation for LLM fairness testing.

Wang, M., Zhou, Z., Zhang, F., Zhao, J., & Truong, V. T. D. Improving robustness of semantic segmentation for autonomous driving: A case study.


Short Papers

Saad, O., Abdelkarim, M., & Eladawi, R. TE-TCP-Net: Parallel transformer-encoders for test case prioritization.

Yamin, M. M. VectorSec: A web-based AI security scanner for systematic evaluation of LLM vulnerabilities.

Domingues, J., Duarte, D., Sousa, A., & Pombo, N. AI4SE for CI/CD: Explainable code smell risk analysis.

Rakotoarison, L., & Randriamahenintsoa, F. Do AI agents exhibit greed or empathy in shared resource environments?

Molina, M., Günther, A., & Liggesmeyer, P. Connecting uncertainty quantification to safety engineering: Uncertainty gate framework for safe ML decision making.

Johnson, M., & Slhoub, K. Direct transfer learning for cross-project test case prioritization under cold-start conditions.

Bulda, A., Itkin, A., Degtiarenko, D., & Itkin, I. Passive testing of AI-enhanced financial systems via protocol traffic analysis and property-based validation.

Antony, S. SwarmMind: Emergent LLM ensemble consensus with penalise-only feedback for adaptive futures trading.

Banik, D., Chowdhury, K., & Shamim, S. I. All smoke, no alarm: Oracle signals in agent-authored test code.

Poola, R., Kusha, B. P., & Varri, P. C. R. Spatiotemporal graph neural networks with adaptive graph learning for public transit demand forecasting and load-aware route optimization.

Pasupuleti, V., Songa, S. T., Gulkotwar, N., Allala, S. R., & Tyagi, S. Behavioral regression testing for continuously updated machine learning models.

Ni, L., & Tao, L. Bandit-controlled semi-ensemble learning for accuracy-energy trade-off under real-time edge conditions.

Blanco, R., Suárez-Cabal, M. J., Tuya, J., & de la Riva Álvarez, C. A RAG approach for assisting testers in database query testing.

Katikala, J. M., Sharma, K., Swaminathan Ravi Kumar, T., & Sharma, S. Conversational AI for safety-critical virtual and extended reality: Intelligent agents across health informatics, navigation, and urban sensing.