This ability meant a user could draw a box around an object in an image and search for it. For example, a bag being held by someone in a photo. Bing Visual Search would then find the photo and return relevant results, like where you can buy it and what it is. By using Object Detection, Microsoft has made this process simpler. Now Bing will draw the boxes for the user, removing the manual function. Users just need to click on a hotspot over an object of interest. Objects of interest are denoted by Bing, which will detect and mark items in an image. Clicking one of these marked hotspots allows Visual Search to automatically select it. Like before, the results will be shown in the Related Products and Related Images sections.
Microsoft is encouraging users to test the feature. However, the company says it is still building Object Detection and the functionality is limited to certain fashion categories.
Object Detection in Bing Visual Search
The company explains that Bing needs to be able to determine the category of an object, but also detect its location within a frame. Microsoft used Faster R-CNN as its DNN based object detection framework. The company says the framework provides the best balance of speed and accuracy. Most DNN solutions generate region proposals offline, but Faster R-CNN is fast enough to do it online. This means it can be integrated into an online service, such as Bing. However, there were limitations with the framework that needed to be overcome. For example, Faster R-CNN was taking around 1.5 seconds per image for object detection. Microsoft believe this was too slow in a market were users are accustomed to seemingly instant search results. Bing researchers headed to their counterparts in the Azure cloud division. The team was testing new Azure NVIDIA GPU instances that improved performance. Microsoft Research has developed a scalable key-value store called ObjectStore, which allowed Microsoft to decrease latency further.