Sunday, November 27, 2011

Big data quantitative analysis toolkit

Just like data bytes may become much more richly modulated with attributes (an extension of data provenance; modulating data bytes with additional inspectable elements such as create/review/launch time stamps, and owner, quality, freshness, and controversy properties, etc.), so too may quantitative data sets.

There should be a ‘2.0 format’ standardized toolkit for quantitative data analysis that includes the top ten techniques often used to analyze data sets. These tools should be user-friendly, ideally as a widget overlay to websites, or otherwise easily accessible and usable by non-quant laypersons.

Suggested techniques for inclusion in the top ten most-useful data analysis tools:
  1. Fourier transforms
  2. Markov state models
  3. Entropy analysis
  4. Distribution analysis (e.g.; power law, Gaussian, etc.)
  5. Progression analysis (e.g.; linear, geometric, exponential, discontinuous)
  6. Qualitative math
  7. Network node/group theory/graphing theory analysis
  8. Complexity, chaos, turbulence, and perturbation modeling
It could become standard that these kinds of techniques are automatically run and displayed on large data sets.