A key contemporary trend is big data - the creation and manipulation of large complex data sets that must be stored and managed in the cloud as they are too unwieldy for local computers. Big data creation is currently on the order of zettabytes (10007 bytes) per year, in roughly equal amounts by four segments: individuals (photos, video), companies (transaction monitoring), governments (surveillance (e.g.; the new Utah Data Center)), and scientific research (astronomical observations).
Big data fanfare abounds, we continuously hear announcements like more data was created last year than in the entire history of humanity, and that data creation is on a two year-doubling cycle. Better cheap fast storage has been the historical answer to supporting the ever-growing capacity to generate data, however this is not necessarily the best solution. Already much collected data is thrown away (e.g.; CCTV footage, real-time surgery video, and genome sequencing data) without saving anything. Much of stored data remains unused, and not cleaned up into a form that is human-usable since this is costly and challenging (de-duplication a primary example).
Turning big data into smart data means moving away from data fundamentalism, the idea that data must be collected, and that data collection in itself is an ends rather than a means. Advancement comes from smart data, not more data; being able to cleanly extract and use salient aspects of data (e.g.; the ‘diffs,’ for example identifying relevant genomic polymorphisms from the whole genome sequence), not just generate and discard or mindlessly store.
Big data fanfare abounds, we continuously hear announcements like more data was created last year than in the entire history of humanity, and that data creation is on a two year-doubling cycle. Better cheap fast storage has been the historical answer to supporting the ever-growing capacity to generate data, however this is not necessarily the best solution. Already much collected data is thrown away (e.g.; CCTV footage, real-time surgery video, and genome sequencing data) without saving anything. Much of stored data remains unused, and not cleaned up into a form that is human-usable since this is costly and challenging (de-duplication a primary example).
Turning big data into smart data means moving away from data fundamentalism, the idea that data must be collected, and that data collection in itself is an ends rather than a means. Advancement comes from smart data, not more data; being able to cleanly extract and use salient aspects of data (e.g.; the ‘diffs,’ for example identifying relevant genomic polymorphisms from the whole genome sequence), not just generate and discard or mindlessly store.