Show and Tell: A Neural Image Caption Generator

Automatically describing the content of an image is a fundamental problem in artificial intelligence that connects computer vision and natural language processing. In this paper, we present a generative model based on a deep recurrent architecture that combines recent advances in computer vision and machine translation and that can be used to generate natural sentences describing an image. The model is trained to maximize the likelihood of the target description sentence given the training image. Experiments on several datasets show the accuracy of the model and the fluency of the language it learns solely from image descriptions. Our model is often quite accurate, which we verify both qualitatively and quantitatively. For instance, while the current state-of-the-art BLEU score the higher the better on the Pascal dataset is 25, our approach yields 59, to be compared to human performance around 69. We also show BLEU score improvements on Flickr30k, from 55 to 66, and on SBU, from 19 to 27.

Full article and PDF.

via Google Research Blog

Tagged , , ,

DeepLearning.University: an annotated deep learning bibliography

DeepLearning.University – An Annotated Deep Learning Bibliography | Memkite.

An impressive, annotated bibliography of recent deep learning papers. Doesn’t include publications before 2014 but still quite impressive and comprehensive.

Tagged ,

Digital X-ray images in 5 seconds

Wanting to replace the medical equipment for taking X-rays, experts in Mexico have created a system of digital x-ray imaging, which replaces the traditional plaque by a solid detector, which delivers results in five seconds. Analog equipment take six minutes to develop the traditional film.

Read the full article on ScienceDaily.

Tagged

Machine Learning, meet Computer Vision

Computer vision, the field of building computer algorithms to automatically understand the contents of images, grew out of AI and cognitive neuroscience around the 1960s. “Solving” vision was famously set as a summer project at MIT in 1966, but it quickly became apparent that it might take a little longer! The general image understanding task remains elusive 50 years later, but the field is thriving. Dramatic progress has been made, and vision algorithms have started to reach a broad audience, with particular commercial successes including interactive segmentation available as the “Remove Background” feature in Microsoft Office, image search, face detection and alignment, and human motion capture for Kinect. Almost certainly the main reason for this recent surge of progress has been the rapid uptake of machine learning ML over the last 15 or 20 years.

This first post in a two-part series will explore some of the challenges of computer vision and touch on the powerful ML technique of decision forests for pixel-wise classification.

Read the full article by Jamie Shotton, Antonio Criminisi and Sebastian Nowozin on TechNet Blogs – Machine Learning.

Tagged , , ,

Thank you

Tagged