Broader Perspective: natural language processing

Sunday, March 27, 2011

Human language ambiguity and AI development

Since Jeopardy questions are not fashioned in SQL, and are in fact designed to obfuscate rather than elucidate, IBM’s Watson program used a number of indirect DeepQA processes to beat human competitors in January 2011 (Figure 1). A good deal of the program’s success may be attributable to algorithms that handle language ambiguity, for example, ranking potential answers by grouping their features to produce evidence profiles. Some of the twenty or more features analyzed include data type (e.g.; place, name, etc.), support in text passages, popularity, and source reliability.

Figure 1: IBM's supercomputer Watson wins Jeopardy!

Image credit: New Scientist

Since managing ambiguity is critical to successful natural language processing, it might be easier to develop AI in some human languages as opposed to others. Some languages are more precise. Languages without verb conjugation and temporal indication are more ambiguous and depend more on inferring meaning from context. While it might be easier to develop a Turing-test passing AI in these languages, it might not be as useful for general purpose problem solving since context inference would be challenging to incorporate. Perhaps it would be most expedient to develop AI in some of the most precise languages first, German or French, for example, instead of English.

Wednesday, January 24, 2007

Stocks 2.0: metrics, blogs and reviews

The Web 2.0 movement should be providing a whole new level of accessible validated aggregated information regarding publicly traded companies.

1) Standardized non-financial metrics
There should be standardized reporting (as for financial statements) and a template and widget code for socially responsible investment screening data in the usual areas: Business Practices and Governance, Environment, Human Rights, Diversity, Labor, Community, Animal Testing and Tobacco, Alcohol, Weapons and Gambling.

Currently, this research is developed and implemented in proprietary ways by Citizens Funds, Calvert, the Social Venture Technology Group and others who could work in standards bodies and open source ways.

2) Ecosystem information
Corporate employee blogs should be on Yahoo Finance, Google Finance, etc. Customer, vendor and supplier reviews and D&B, Better Business Bureau and Chamber of Commerce data should be aggregated. More usable data regarding insider trades, litigation and negative PR would also be helpful. Companies can make themselves more attractive by providing this information as well as access to some permissioned-level of internal prediction markets and other innovative investment-worthy information.

3) Publicly available supervisor reviews
As the world is Yelp-ified with ubiquitous reviews, businesses can be competitive in complying with the expectation for reviews. Public CEO, executive and general management reviews may become commonplace and expected. Academic review sites have surmounted the attendant legal issues and become de rigueur in providing professor reviews with sites like RateMyProfessors and CourseReviews; why not for corporate bosses too.

4) NLP and other AI tech tools
Mutual funds and hedge funds are paying up for a new set of AI tools using NLP, natural language processing, to rapidly assess early signs of potential stock movement from rumblings in the blogosphere and news articles provided by companies such as CollectiveIntellect and Novamente. At some point all information posted to the Internet including podcasts and videos will need to be searched and aggregated. Making some versions of these bubble-up NLP-based summarization tools publicly-available could improve their value and acceptance.

Broader Perspective

Sunday, March 27, 2011

Human language ambiguity and AI development

Wednesday, January 24, 2007

Stocks 2.0: metrics, blogs and reviews

Popular Posts

Contact

About Me

Followers

ShareThis

Blog Archive

LaBlogga Out and About

Label Cloud

Tags

CO2Stats