Skip to content Skip to sidebar Skip to footer

Usage Of U2Net Model In Android

I converted the original u2net model weight file u2net.pth to tensorflow lite by following these instructructions, and it is converted successfully. However I'm having trouble usin

Solution 1:

I will write a long answer here. Getting in touch with the github repo of U2Net it leaves you with the effort to examine the pre and post-processing steps so you can aply the same inside the android project.

First of all preprocessing: In the file you can see at this line that all the images are preprocessed with function ToTensorLab(flag=0). Navigating to this you see that with flag=0 the preprocessing is this:

else: # with rgb color (flag = 0)
            tmpImg = np.zeros((image.shape[0],image.shape[1],3))
            image = image/np.max(image)
            if image.shape[2]==1:
                tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,1] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,2] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
                tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224
                tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225

Pay attention to 2 steps.

First every color pixel value is divided by the maximum value of all color pixel values:

image = image/np.max(image)


Second at every color pixel value is applied mean and std:

tmpImg[:,:,0] = (image[:,:,0]-0.485)/0.229
tmpImg[:,:,1] = (image[:,:,1]-0.456)/0.224
tmpImg[:,:,2] = (image[:,:,2]-0.406)/0.225

So basically in Kotlin if you have a bitmap you have to do something like:

fun bitmapToFloatArray(bitmap: Bitmap):
                Array<Array<Array<FloatArray>>> {
            val width: Int = bitmap.width
            val height: Int = bitmap.height
            val intValues = IntArray(width * height)
            bitmap.getPixels(intValues, 0, width, 0, 0, width, height)

            // Create aa array to find the maximum value
            val fourDimensionalArray = Array(1) {
                Array(320) {
                    Array(320) {
            for (i in 0 until width - 1) {
                for (j in 0 until height - 1) {
                    val pixelValue: Int = intValues[i * width + j]
                    fourDimensionalArray[0][i][j][0] =
                    fourDimensionalArray[0][i][j][1] =
                    fourDimensionalArray[0][i][j][2] =

            // Convert multidimensional array to 1D
            val oneDFloatArray = ArrayList<Float>()

            for (m in fourDimensionalArray[0].indices) {
                for (x in fourDimensionalArray[0][0].indices) {
                    for (y in fourDimensionalArray[0][0][0].indices) {

            val maxValue: Float = oneDFloatArray.maxOrNull() ?: 0f
            //val minValue: Float = oneDFloatArray.minOrNull() ?: 0f

            // Final array that is going to be used with interpreter
            val finalFourDimensionalArray = Array(1) {
                Array(320) {
                    Array(320) {
            for (i in 0 until width - 1) {
                for (j in 0 until height - 1) {
                    val pixelValue: Int = intValues[i * width + j]
                    finalFourDimensionalArray[0][i][j][0] =
                        (( / maxValue) - 0.485f) / 0.229f
                    finalFourDimensionalArray[0][i][j][1] =
                        (( / maxValue) - 0.456f) / 0.224f
                    finalFourDimensionalArray[0][i][j][2] =
                        (( / maxValue) - 0.406f) / 0.225f


            return finalFourDimensionalArray

Then this array is fed inside the interpreter and as your model has multiple outputs we are using runForMultipleInputsOutputs:

// Convert Bitmap to Float array
             val inputStyle = ImageUtils.bitmapToFloatArray(loadedBitmap)

            // Create arrays with size 1,320,320,1
            val output1 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output2 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output3 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output4 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output5 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}
            val output6 =  Array(1) { Array(CONTENT_IMAGE_SIZE) { Array(CONTENT_IMAGE_SIZE) { FloatArray(1)}}}

            val outputs: MutableMap<Int,
                    Any> = HashMap()
            outputs[0] = output1
            outputs[1] = output2
            outputs[2] = output3
            outputs[3] = output4
            outputs[4] = output5
            outputs[5] = output6
            // Runs model inference and gets result.
            val array = arrayOf(inputStyle)
            interpreterDepth.runForMultipleInputsOutputs(array, outputs)

Then we use the first output of the interpreter as you can see at file. (I have also printed results of line 112 but it seems that it has no effect. You are free to try that with min and max value of the color pixel values). So we have the post proseccing like you can see at the save_output function:

// Convert output array to Bitmap
val (finalBitmapGrey, finalBitmapBlack) = ImageUtils.convertArrayToBitmapTensorFlow(
                output1, CONTENT_IMAGE_SIZE,

where the above function will be like:

fun convertArrayToBitmapTensorFlow(
            imageArray: Array<Array<Array<FloatArray>>>,
            imageWidth: Int,
            imageHeight: Int
        ): Bitmap {
            val conf = Bitmap.Config.ARGB_8888 // see other conf types
            val grayToneImage = Bitmap.createBitmap(imageWidth, imageHeight, conf)

            for (x in imageArray[0].indices) {
                for (y in imageArray[0][0].indices) {
                    val color = Color.rgb(
                        (((imageArray[0][x][y][0]) * 255f).toInt()),
                        (((imageArray[0][x][y][0]) * 255f).toInt()),
                        (((imageArray[0][x][y][0]) * 255f).toInt())

                    // this y, x is in the correct order!!!
                    grayToneImage.setPixel(y, x, color)
            return grayToneImage

then this grayscale image you can use it as you want.

Due to multiple steps of the preprocessing I used directly interpreter with no additional libraries. I will try later in the week if you can insert metadata with all the steps but I doubt that.

If you need some clarifications please do not hesitate to ask me.

Colab notebook link

Happy coding

Post a Comment for "Usage Of U2Net Model In Android"