Uploaded photo databases are the new 'data' corpora.The first moment this shift became clear in Google’s announcement in June 2012 of the ability to recognize images of cats (the most frequently appearing entity in YouTube videos), and in the big data industry’s continual innovation to manage unstructured data like photos. Now, the sheer volume of image-related activity and opportunity for different consumer and commercial applications is making image classification a focal area for the tech industry.
Functionality is being developed for classifying and accessing content – both images and all web content - with tools such as Imagga as a cloud-based image classification program, and OpenCalais, a standard for text-based semantic content analysis and organization. What we might now start to call the corpus characterization ecosystem is expanding into related tools like DocumentCloud that runs uploaded documents through OpenCalais as a service to extract information about the nouns (e.g.; people, places, and organizations) mentioned.
API integration remains an ongoing trend of the year, with integrated API and developer management platforms like Mashery continuing to grow. API integration platforms give companies a means of facilitating and encouraging external developers to access their content to make apps, and give developers a standardized means of accessing large varieties of data content from different sources to integrate in creating a new generation of sophisticated apps and web services.