Mutation-based Fault Localization of Deep Neural Networks: Supported DNN Bugs

cover
12 Mar 2024

This paper is available on arxiv under CC 4.0 license.

Authors:

(1) Ali Ghanbari, Dept. of Computer Science, Iowa State University;

(2) Deepak-George Thomas, Dept. of Computer Science, Iowa State University;

(3) Muhammad Arbab Arshad, Dept. of Computer Science, Iowa State University;

(4) Hridesh Rajan, Dept. of Computer Science, Iowa State University.

V. SUPPORTED DNN BUGS

Due to the complex nature of DNN bugs, and MBFL itself, we do not hope to give a formal account of what types of DNN bugs deepmufl is capable of localizing. Instead, we attempt to provide as accurate description of the supported bugs as possible and discuss the way such bugs manifest in DNN programs. The discussion given in this section leverages the characterization of DNN bugs provided by previous research [7], [4], [6].

As we mentioned earlier, current version of deepmufl operates on pre-trained Keras Sequential models. This means that much of the information, such as training hyperparameters and whether or not the input data is normalized, has already been stripped away from the input to deepmufl, and the current version of the technique is not capable of detecting any bug related to training process, e.g., training data and hyper-parameters. Moreover, a pre-trained model does not contain bugs related to tensor shapes (as otherwise, the training would fail with shape errors), and since deepmufl does not receive the source code of the buggy model as input, bugs related to GPU usage and API misuse are also out of the reach of the technique, by definition. This leaves us with the socalled model bugs [7] the extent to which deepmufl is capable of localizing is explicated below. The four model bug subcategories are represented with identifiers SC1, ..., SC4 in the rest of this paper for ease of reference.

• SC1: Activation function. These bugs are related to the use of wrong activation function in a layer. We observed that deepmufl detects this type of bugs and it also gives actionable, direct fixes.

• SC2: Model type or properties. These bugs include wrong weight initialization, wrong network architecture, wrong model for the task, etc. Through altering the weights and biases in layers, deepmufl detects weight/bias initialization bugs and pinpoint the location of the bug, but the bug report produced by the tool does not provide helpful information for fixing.

• SC3: Layer properties. These bugs include wrong filter/kernel/stride size, sub-optimal number of neurons in a layer, wrong input sample size, etc. deepmufl detects and pinpoints the bugs related to filter/kernel/stride size and sub-optimal number of neurons. We observed that, the former case sometimes produce non-viable mutants. In the cases where deepmufl produced viable mutants, effective MBFL takes place and it has been able to pinpoint the bug location and provide explanation on how to fix it. In the latter case, deepmufl was able to pinpoint the bug location, but the bug report does not give helpful information on how to fix the bugs in this sub-category.

• SC4: Missing/redundant/wrong layer. These bugs include missing/extra one dense layer, missing dropout layer, missing normalization layer, etc. By mutating the layers adjacent to the missing layer, or deleting the redundant layer, deepmufl detects and pinpoints the location of the missing/culprit layer, and in most of the cases, it provides useful information on how to fix such bugs.

By manually examining the bug descriptions provided by the programmers in our dataset of bugs, and also referring to the previous work on DNN bugs and root cause characterization [4], these bugs might manifest as low test accuracy/MSE, constant validation accuracy/MSE/loss during training, NaN validation accuracy/MSE/loss during training, dead nodes, vanishing/exploding gradient, and saturated activation.

At this point, we would like to emphasize that deepmufl is not intended to repair a model, so if a mutation happens to be the fix for the buggy model, the model has to be retrained from scratch so that correct weights and biases will be calculated.