Showing posts with label machine learning. Show all posts
Showing posts with label machine learning. Show all posts

Sunday, February 02, 2014

Turning Big Data into Smart Data

A key contemporary trend is big data - the creation and manipulation of large complex data sets that must be stored and managed in the cloud as they are too unwieldy for local computers. Big data creation is currently on the order of zettabytes (10007 bytes) per year, in roughly equal amounts by four segments: individuals (photos, video), companies (transaction monitoring), governments (surveillance (e.g.; the new Utah Data Center)), and scientific research (astronomical observations).

Big data fanfare abounds, we continuously hear announcements like more data was created last year than in the entire history of humanity, and that data creation is on a two year-doubling cycle. Better cheap fast storage has been the historical answer to supporting the ever-growing capacity to generate data, however this is not necessarily the best solution. Already much collected data is thrown away (e.g.; CCTV footage, real-time surgery video, and genome sequencing data) without saving anything. Much of stored data remains unused, and not cleaned up into a form that is human-usable since this is costly and challenging (de-duplication a primary example).

Turning big data into smart data means moving away from data fundamentalism, the idea that data must be collected, and that data collection in itself is an ends rather than a means. Advancement comes from smart data, not more data; being able to cleanly extract and use salient aspects of data (e.g.; the ‘diffs,’ for example identifying relevant genomic polymorphisms from the whole genome sequence), not just generate and discard or mindlessly store.

Sunday, November 10, 2013

State of Tech: Image Corpus Corralling and API Integration

Two of the biggest current tech trends are image corpus ‘corralling’ and API integration. 1 billion images are taken per day and 6 billion photos are shared monthly.
Uploaded photo databases are the new 'data' corpora.
The first moment this shift became clear in Google’s announcement in June 2012 of the ability to recognize images of cats (the most frequently appearing entity in YouTube videos), and in the big data industry’s continual innovation to manage unstructured data like photos. Now, the sheer volume of image-related activity and opportunity for different consumer and commercial applications is making image classification a focal area for the tech industry.

Functionality is being developed for classifying and accessing content – both images and all web content - with tools such as Imagga as a cloud-based image classification program, and OpenCalais, a standard for text-based semantic content analysis and organization. What we might now start to call the corpus characterization ecosystem is expanding into related tools like DocumentCloud that runs uploaded documents through OpenCalais as a service to extract information about the nouns (e.g.; people, places, and organizations) mentioned.

API integration remains an ongoing trend of the year, with integrated API and developer management platforms like Mashery continuing to grow. API integration platforms give companies a means of facilitating and encouraging external developers to access their content to make apps, and give developers a standardized means of accessing large varieties of data content from different sources to integrate in creating a new generation of sophisticated apps and web services. 

Sunday, May 26, 2013

AAAI 2014: Connecting Machine Learning and Human Intelligence

The AAAI Spring Symposia are a place for worldwide artificial intelligence, machine learning, and other computer scientists to present and discuss innovative theoretical research in a workshop-like environment. In 2013, some of the topics included: learning, autonomous systems, wellness, crowd computing, behavior change, and creativity.

Proposals are underway for 2014. Please indicate your opinion by voting at the poll at the top right for these potential topics:
  • My data identity: personal, social, universal 
  • Big data becomes personal: knowledge into meaning 
  • Wearable computing and digital affect: wellness, bioidentity, and intentionality 
  • Big data, wearable computing, and identity construction: knowledge becomes meaning 
  • Personalized knowledge generation: identity, intentionality, action, and wellness

Sunday, April 14, 2013

Human Microbiome: Futurist Augmentation Platform

The human microbiome is essential in working symbiotically with the human (and indeed all animals) for nutrient synthesis and pathology prevention. However, the large numbers of microbial populations are complicated and dynamic which makes it challenging to profile their activity and construct meaningful interventions. The 14th Annual Microbiology Student Symposium, held at UC Berkeley on April13, 2013 addressed some of these issues (conference program). 

There is tremendous microbiomic variation between individuals – a person’s gut microbiomic signature is perhaps as uniquely distinguishable as a fingerprint. There may be variability within the individual too, but there is a strong trend to persistent populations over time. The microbiome adjusts quickly to dietary and environmental change, within a day, and can shift back just as quickly. If certain populations are wiped out, other substitute species within the same taxa or phylum may emerge to (supposedly) fulfill a similar function. Pathology conditions like Crohn’s disease, colitis, and irritable bowel syndrome (IBS, IBD) are likely to mean the dysbiosis (e.g.; microbial imbalance) of the whole biosystem (not just are certain disease-related bacterial populations elevated, but mitigating populations may be much lower. Given the complexity of the microbiome with thousands of species across tax and phyla, machine learning techniques may be useful in combining a series of weak signals into a prognostication as the SLiME Project in the Eric Alm lab at MIT has done, claiming to predict IBD as accurately as other non-invasive methods.

In the longer term, the microbiome could be the perfect platform for many different less-invasive augmentations for the human - bringing on board micro-connectivity, memory, processing, and electronic storage (Google Gut Glass?), with applications such as real-time life-tracking and quantified-self monitoring and intervention.

Sunday, March 31, 2013

What's new in AI? Trust, Creativity, and Shikake

The AAAI spring symposia held at Stanford University in March provide a nice look at the potpourri of innovative projects in process around the world by academic researchers in the artificial intelligence field. This year’s eight tracks can be grouped into two overall categories: those that focus on computer-self interaction or computer-computer interaction, and those that focus on human-computer interaction or human sociological phenomena as listed below.

Computer self-interaction or computer-computer interaction (Link to details)
  • Designing Intelligent Robots: Reintegrating AI II 
  • Lifelong Machine Learning 
  • Trust and Autonomous Systems 
  • Weakly Supervised Learning from Multimedia
Human-computer interaction or human sociological phenomena (Link to details)
  • Analyzing Microtext 
  • Creativity and (Early) Cognitive Development 
  • Data Driven Wellness: From Self-Tracking to Behavior Change 
  • Shikakeology: Designing Triggers for Behavior Change 
This last topic, Shikakeology, is an interesting new category that is completely on-trend with the growing smart matter, Internet-of-things, Quantified Self, Habit Design, and Continuous Monitoring movements. Shikake is a Japanese concept, where physical objects are embedded with sensors to trigger a physical or psychological behavior change. An example would be a trash can playing an appreciative sound to encourage litter to be deposited.

Sunday, October 21, 2012

Singularity Summit 2012: Image Recognition, Analogy, Big Health Data, and Bias Reduction

The seventh Singularity Summit was held in San Francisco, California on October 13-14, 2012. As in other years, there were about 600 attendees, although this year’s conference program included both general-interest science and singularity-related topics. Singularity in this sense denotes a technological singularity - a potential future moment when smarter-than-human intelligence may arise. The conference was organized by the Singularity Institute, who focuses on researching safe artificial intelligence architectures. The key themes of the conference are summarized below. Overall the conference material could be characterized as incrementalism within the space of traditional singularity-related work and faster-moving advances coming in other fields such as image recognition, big health data, synthetic biology, crowdsourcing, and biosensors.

Key Themes:
  • Singularity Thought Leadership
  • Big Data Artificial Intelligence: Image Recognition
  • Era of Big Health Data
  • Improving Cognition: Bias Reduction and Analogies
  • Singularity Predictions
Singularity Thought Leadership
Singularity thought leader Vernor Vinge, who coined the term technological singularity, provided an interesting perspective. Already since at least 2000, he has been referring to the idea of computing-enabled matter and the wireless Internet-of-things as Digital Gaia. He noted that 5% of objects worldwide are already embedded with microprocessors, and it could be scary as reality ‘wakes up’ further, especially as we are unable to control other phenomena we have created such as financial markets. He was pessimistic regarding privacy, suggesting that Brin’s traditional counterproposal to surveillance, sousveillance, is not necessarily better. More positively, he discussed the framing of computers as a neo-neocortex for the brain, extreme UIs to provide convenient and unobtrusive cognitive support, other intelligence amplification techniques, and how we have been unconsciously prepping many of our environments for robotic operations. There has also been the rise of an important resource in crowdsourcing as the network (the Internet plus potentially 7 billion Turing-test passing agents) filters optimal resources to specific cognitive tasks (like protein folding analysis).

Big Data Artificial Intelligence: Image Recognition
Peter Norvig continued in his usual vein of discussing what has been important in resolving contemporary problems in artificial intelligence. In machine translation (interestingly a Searlean Chinese room), the key was using large online data corpuses and straightforward machine learning algorithms (The Unreasonable Effectiveness of Data). In more recent work, his lab at Google has been able to recognize pictures of cats. In this digital vision processing advance (announced in June 2012 (article, paper)), the key was creating neural networks for machine learning that used hierarchical representation and problem solving, and again large online data corpuses (10 million images scanned by 16,000 computers) and straightforward learning algorithms.

Era of Big Health Data 
Three speakers presented innovations in the era of big health data, a sector which is generating data faster than any other and starting to use more sophisticated artificial intelligence techniques. Carl Zimmer pointed out that new viruses are continuing to develop and spread, and that this is expected to persist. Encouragingly, new viruses are genetically sequenced increasingly rapidly, but it still takes time breed up vaccines. A faster means of vaccine production could possibly come from newer techniques in synthetic biology and nanotechnology such as those from Angela Belcher’s lab.  Linda Avey discussed Curious, Inc, a personal data discovery platform in beta launch that looks for correlations across big health data streams (more information). John Wilbanks discussed the pyrrhic notion of privacy provided by traditional models as we move to a cloud-based big health data era (for example, only a few data points are needed to identify an individual and medical records may have ~500,000). Some health regulatory innovations include an updated version of HIPAA privacy policies, a portable consent for granting the use of personalized genomic data, and a network where patients may connect directly with researchers.

Improving Cognition: Bias Reduction and Analogies (QS’ing Your Thinking) 
A perennial theme in the singularity community is improving thinking and cognition, for example through bias reduction. Nobel Prize winner Daniel Kahneman spoke remotely on his work regarding fast and slow thinking. We have two thinking modes, fast (blink intuitions) and slow (more deliberative logical) thinking, both of which are indispensable and potentially problematic. Across all thinking is a strong inherent loss aversion, and this helps to generate a bias towards optimism. Steven Pinker also spoke about the theme of bias, indirectly. In recent work, he found that there has been a persistent decline in violence over the multi-century history of time, possibly mostly due to increases in affluence and literacy/knowledge. This may seem counter to popular media accounts which, guided by short-term interests, help to create an area of societal cognitive bias. Other research regarding cognitive enhancement and the processes of intelligence was Melanie Mitchell’s claim that analogies are a key attribute of intelligence. The practice of using analogies in new and appropriate ways could be a means of identifying intelligence, perhaps superior to other mechanisms such as general-purpose problem solving, question-answering, or Turing test-passing as the traditional proxies for intelligence.

Singularity Predictions 
Another persistent theme in the singularity community is sharpening analysis, predictions, and context around the moment when there might be greater-than-human intelligence. Singularity movement leader Ray Kurzweil made his usual optimistic remarks accompanied by slides with exponentiating curves of technology cost/functionality improvements, but did not confirm or update his long-standing prediction of a technological singularity circa 2045 [1]. Stuart Armstrong pointed out how predictions are usually 15-25 years out, and that this is true every year. In an analysis of the Singularity Institute’s database of 257 singularity predictions from 1950 forward, there is no convergence of time in estimates ranging from 2020-2080. Vernor Vinge encourages the consideration of a wide range of scenarios and methods including ‘What if the Singularity Doesn’t Happen.’ The singularity prediction problem might be improved by widening the possibility space, for example perhaps it less useful to focus on intelligence as the exclusive element for the moment of innovation, speciation, or progress beyond human-level; other dimensions such as emotional intelligence, empathy, creativity, or a composite thereof could be considered.

Reference
1. Kurzweil, R. The Singularity is Near; Penguin Group: New York, NY, USA, 2006; pp. 299-367.

Sunday, March 27, 2011

Human language ambiguity and AI development

Since Jeopardy questions are not fashioned in SQL, and are in fact designed to obfuscate rather than elucidate, IBM’s Watson program used a number of indirect DeepQA processes to beat human competitors in January 2011 (Figure 1). A good deal of the program’s success may be attributable to algorithms that handle language ambiguity, for example, ranking potential answers by grouping their features to produce evidence profiles. Some of the twenty or more features analyzed include data type (e.g.; place, name, etc.), support in text passages, popularity, and source reliability.

Figure 1: IBM's supercomputer Watson wins Jeopardy!

Image credit: New Scientist

Since managing ambiguity is critical to successful natural language processing, it might be easier to develop AI in some human languages as opposed to others. Some languages are more precise. Languages without verb conjugation and temporal indication are more ambiguous and depend more on inferring meaning from context. While it might be easier to develop a Turing-test passing AI in these languages, it might not be as useful for general purpose problem solving since context inference would be challenging to incorporate. Perhaps it would be most expedient to develop AI in some of the most precise languages first, German or French, for example, instead of English.