Showing posts with label algorithm. Show all posts
Showing posts with label algorithm. Show all posts

Sunday, November 10, 2013

State of Tech: Image Corpus Corralling and API Integration

Two of the biggest current tech trends are image corpus ‘corralling’ and API integration. 1 billion images are taken per day and 6 billion photos are shared monthly.
Uploaded photo databases are the new 'data' corpora.
The first moment this shift became clear in Google’s announcement in June 2012 of the ability to recognize images of cats (the most frequently appearing entity in YouTube videos), and in the big data industry’s continual innovation to manage unstructured data like photos. Now, the sheer volume of image-related activity and opportunity for different consumer and commercial applications is making image classification a focal area for the tech industry.

Functionality is being developed for classifying and accessing content – both images and all web content - with tools such as Imagga as a cloud-based image classification program, and OpenCalais, a standard for text-based semantic content analysis and organization. What we might now start to call the corpus characterization ecosystem is expanding into related tools like DocumentCloud that runs uploaded documents through OpenCalais as a service to extract information about the nouns (e.g.; people, places, and organizations) mentioned.

API integration remains an ongoing trend of the year, with integrated API and developer management platforms like Mashery continuing to grow. API integration platforms give companies a means of facilitating and encouraging external developers to access their content to make apps, and give developers a standardized means of accessing large varieties of data content from different sources to integrate in creating a new generation of sophisticated apps and web services. 

Sunday, February 05, 2012

The big data era's flux and pulse

Big data is an important contemporary trend but what does it actually mean?

What is big data?
Big data refers not just to the absolute size of a body of information (which currently can be on the order of terabytes, petabytes, and exabytes), but its usability and manageability. Some of the defining parameters of big data are its large size, high velocity activity (incoming, processing, outgoing), heterogeneous nature (a variety of structured and unstructured data types like video and images), and requirement for real-time analytics.

What is the process of working with big data?
The process of working with big data involves several steps. First there may be an exploration of the data using tools for classification, visualization, and summarization. Then there is the detailed step of data cleaning to make the data consistent and usable. The next step is data reduction, for example defining and extracting attributes, decreasing the dimensions of data, representing the problems to be solved, summarizing the data, and selecting portions of the data for analysis. Then, the steps of predictive analytics, scoring, reporting, publishing, and quality validation and maintenance can be applied.

What are the applications of big data analysis?
Some of the benefits of big data analysis are the ability to summarize information, make predictions, identify trends (for example, consumer spending patterns), and rank and prioritize information. Some of the specific algorithms employed include for summarizing: clustering and associations; for making predictions: tree-based methods, neural networks, and k-nearest neighbors; for identification: anomaly detection, similarities and matches, and change detection; and for ranking: logistics and frequency detection.

Excerpted from an Association for Computing Machinery (ACM) talk on Big Data & Predictive Analytics (slides).