Usage Of U2Net Model In Android
Solution 1:
I will write a long answer here. Getting in touch with the github repo of U2Net it leaves you with the effort to examine the pre and post-processing steps so you can aply the same inside the android project.
First of all preprocessing:
In the u2net_test.py
file you can see at this line that all the images are preprocessed with function ToTensorLab(flag=0)
. Navigating to this you see that with flag=0 the preprocessing is this:
else: # with rgb color (flag = 0)
tmpImg = np.zeros((image.shape[0],image.shape[1],3))
image = image/np.max(image)
if image.shape[2]==1:
tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,1] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,2] = (image[:,:,0]-0.485)/0.229
else:
tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224
tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225
Pay attention to 2 steps.
First every color pixel value is divided by the maximum value of all color pixel values:
image = image/np.max(image)
and
Second at every color pixel value is applied mean and std:
tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224
tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225
So basically in Kotlin if you have a bitmap you have to do something like:
fun bitmapToFloatArray(bitmap: Bitmap):
Array<Array<Array<FloatArray>>> {
val width: Int = bitmap.width
val height: Int = bitmap.height
val intValues = IntArray(width * height)
bitmap.getPixels(intValues, 0, width, 0, 0, width, height)
// Create aa array to find the maximum value
val fourDimensionalArray = Array(1) {
Array(320) {
Array(320) {
FloatArray(3)
}
}
}
// https://github.com/xuebinqin/U-2-Net/blob/f2b8e4ac1c4fbe90daba8707bca051a0ec830bf6/data_loader.py#L204
for (i in 0 until width - 1) {
for (j in 0 until height - 1) {
val pixelValue: Int = intValues[i * width + j]
fourDimensionalArray[0][i][j][0] =
Color.red(pixelValue)
.toFloat()
fourDimensionalArray[0][i][j][1] =
Color.green(pixelValue)
.toFloat()
fourDimensionalArray[0][i][j][2] =
Color.blue(pixelValue).toFloat()
}
}
// Convert multidimensional array to 1D
val oneDFloatArray = ArrayList<Float>()
for (m in fourDimensionalArray[0].indices) {
for (x in fourDimensionalArray[0][0].indices) {
for (y in fourDimensionalArray[0][0][0].indices) {
oneDFloatArray.add(fourDimensionalArray[0][m][x][y])
}
}
}
val maxValue: Float = oneDFloatArray.maxOrNull() ?: 0f
//val minValue: Float = oneDFloatArray.minOrNull() ?: 0f
// Final array that is going to be used with interpreter
val finalFourDimensionalArray = Array(1) {
Array(320) {
Array(320) {
FloatArray(3)
}
}
}
for (i in 0 until width - 1) {
for (j in 0 until height - 1) {
val pixelValue: Int = intValues[i * width + j]
finalFourDimensionalArray[0][i][j][0] =
((Color.red(pixelValue).toFloat() / maxValue) - 0.485f) / 0.229f
finalFourDimensionalArray[0][i][j][1] =
((Color.green(pixelValue).toFloat() / maxValue) - 0.456f) / 0.224f
finalFourDimensionalArray[0][i][j][2] =
((Color.blue(pixelValue).toFloat() / maxValue) - 0.406f) / 0.225f
}
}
return finalFourDimensionalArray
}
Then this array is fed inside the interpreter and as your model has multiple outputs we are using runForMultipleInputsOutputs
:
// Convert Bitmap to Float array
val inputStyle = ImageUtils.bitmapToFloatArray(loadedBitmap)
// Create arrays with size 1,320,320,1
val output1 = Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
val output2 = Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
val output3 = Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
val output4 = Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
val output5 = Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
val output6 = Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
val outputs: MutableMap<Int,
Any> = HashMap()
outputs[0] = output1
outputs[1] = output2
outputs[2] = output3
outputs[3] = output4
outputs[4] = output5
outputs[5] = output6
// Runs model inference and gets result.
val array = arrayOf(inputStyle)
interpreterDepth.runForMultipleInputsOutputs(array, outputs)
Then we use the first output of the interpreter as you can see at u2net_test.py
file. (I have also printed results of line 112 but it seems that it has no effect. You are free to try that with min and max value of the color pixel values).
So we have the post proseccing like you can see at the save_output function:
// Convert output array to Bitmap
val (finalBitmapGrey, finalBitmapBlack) = ImageUtils.convertArrayToBitmapTensorFlow(
output1, CONTENT_IMAGE_SIZE,
CONTENT_IMAGE_SIZE
)
where the above function will be like:
fun convertArrayToBitmapTensorFlow(
imageArray: Array<Array<Array<FloatArray>>>,
imageWidth: Int,
imageHeight: Int
): Bitmap {
val conf = Bitmap.Config.ARGB_8888 // see other conf types
val grayToneImage = Bitmap.createBitmap(imageWidth, imageHeight, conf)
for (x in imageArray[0].indices) {
for (y in imageArray[0][0].indices) {
val color = Color.rgb(
//
(((imageArray[0][x][y][0]) * 255f).toInt()),
(((imageArray[0][x][y][0]) * 255f).toInt()),
(((imageArray[0][x][y][0]) * 255f).toInt())
)
// this y, x is in the correct order!!!
grayToneImage.setPixel(y, x, color)
}
}
return grayToneImage
}
then this grayscale image you can use it as you want.
Due to multiple steps of the preprocessing I used directly interpreter with no additional libraries. I will try later in the week if you can insert metadata with all the steps but I doubt that.
If you need some clarifications please do not hesitate to ask me.
Colab notebook link
Happy coding
Post a Comment for "Usage Of U2Net Model In Android"