バウンティングボックス情報の単位変換(boxes_to_pixel_units)

129行目
各バウンティングボックスの座標/サイズ情報配列内のデータは各グリッド内の相対位置/相対サイズなので、画像内の座標に変換する。

パラメータは

box_list : 各バウンティングボックスの座標/サイズ情報配列、
image_width : 入力画像幅(448)
image_height : 入力画像高(448)

grid_size : グリッドサイズ(7)
box_listの配列構成はこんな感じ。

[
  [
 [
   [X座標(グリッド内相対位置), Y座標(グリッド内相対位置), 幅(イメージサイズ比の平方根), 高さ(イメージサイズ比の平方根)],
   [X座標(グリッド内相対位置), Y座標(グリッド内相対位置), 幅(イメージサイズ比の平方根), 高さ(イメージサイズ比の平方根)],
 ]
 ・・・
 同じものがあと6組(合計7組)
  ]
  ・・・
  同じものがあと6組(合計7組)
]

変換後のbox_listの配列構成はこんな感じ。

[
  [
    [
      [X座標(pixel単位), Y座標(pixel単位), 幅(pixel単位), 高さ(pixel単位)],
      [X座標(pixel単位), Y座標(pixel単位), 幅(pixel単位), 高さ(pixel単位)],
    ]
    ・・・
    同じものがあと6組(合計7組)
  ]
  ・・・
  同じものがあと6組(合計7組)
]

def boxes_to_pixel_units(box_list, image_width, image_height, grid_size):

132行目
定義されたバウンティングボックスの数。
Graphファイルに紐づいた値と考えられるので、トップレベルで定義しておいた方が分かりやすいと思うのだが。。。

    boxes_per_cell = 2

136行目
グリッド内オフセットから画像内オフセットに変換するための作業用配列を作成。

    box_offset = np.transpose(np.reshape(np.array([np.arange(grid_size)]*(grid_size*2)),(boxes_per_cell,grid_size, grid_size)),(1,2,0))

う～ん、まとめて書いてあって分かり難いので、分解してみる。

    aa = [np.arange(grid_size)]
    bb = np.array(aa * (grid_size*2))
    cc = np.reshape(bb, (boxes_per_cell,grid_size, grid_size))
    box_offset = np.transpose(cc, (1,2,0))

としたとき、

aa = [
       [0, 1, 2, 3, 4, 5, 6]
     ]
bb = [
       [0, 1, 2, 3, 4, 5, 6],
       ・・・
       同じものがあと13組(合計14組)
     ]
cc = [
       [
         [0, 1, 2, 3, 4, 5, 6],
         ・・・
         同じものがあと6組(合計7組)
       ],
       ・・・
       同じものがあと1組(合計2組)
     ]
box_offset = [
               [
                 [0, 0],
                 [1, 1],
                 [2, 2],
                 [3, 3],
                 [4, 4],
                 [5, 5],
                 [6, 6]
               ],
               ・・・
               同じものがあと6組(合計7組)
             ]

となる。

139行目
各グリッドのX座標データにグリッド番号を加算する(画像内絶対位置になる。単位:グリッド)

    box_list[:,:,:,0] += box_offset

140行目
各グリッドのY座標データにグリッド番号を加算する(画像内絶対位置になる。単位:グリッド)

    box_list[:,:,:,1] += np.transpose(box_offset,(1,0,2))

141行目
各グリッドのX座標とY座標データをグリッド数で割る(画像内相対位置になる)

    box_list[:,:,:,0:2] = box_list[:,:,:,0:2] / (grid_size * 1.0)

    # adjust the lengths and widths
    box_list[:,:,:,2] = np.multiply(box_list[:,:,:,2],box_list[:,:,:,2])
    box_list[:,:,:,3] = np.multiply(box_list[:,:,:,3],box_list[:,:,:,3])

    #scale the boxes to the image size in pixels
    box_list[:,:,:,0] *= image_width
    box_list[:,:,:,1] *= image_height
    box_list[:,:,:,2] *= image_width
    box_list[:,:,:,3] *= image_height

処理を書き換えてみる

なにやら小難しいことをやっているので、実行速度を考えずに分かりやすく書き換えると以下のようになる。

def boxes_to_pixel_units_alt(box_list, image_width, image_height, grid_size):
    boxes_per_cell = 2                               # 定義されたバウンティングボックス数  
    for gy in range(grid_size) :                     # グリッド縦方向ループ
        for gx in range(grid_size) :                 # グリッド横方向ループ
            for bb in range(boxes_per_cell) :        # バウンティングボックスループ
                box_list[gy][gx][bb][0] = ((box_list[gy][gx][bb][0] + gx) / (grid_size * 1.0)) * image_width        # box_x
                box_list[gy][gx][bb][1] = ((box_list[gy][gx][bb][1] + gy) / (grid_size * 1.0)) * image_height       # box_y
                box_list[gy][gx][bb][2] = ( box_list[gy][gx][bb][2] ** 2) * image_width                             # box_widtn
                box_list[gy][gx][bb][3] = ( box_list[gy][gx][bb][3] ** 2) * image_height                            # box_height