http://docandersen.podbean.com
https://docandersen.wordpress.com
http://scottoandersen.wordpress.com
My Amazon author page!!!!
http://lukeoandersen.wordpress.com
http://chuckandersen.wordpress.com
http://NickOandersen.wordpress.com
http://content.iasahome.org/blog
http://www.safegov.org
When you consider a big data or more properly the ingestion and analysis of data then transferred to a mobile device or application on a pc/laptop there are a number of considerations to take into account for optimizing your overall cloud solution to fit the data you need to consume.
I like personally, the definition of big data/analytics that builds on the concept of overwhelming your existing hardware. This can be fixed by deploying specialized solutions (Hadoop or Mapr) that reduce the surface of the data. These solutions allow you to expand to the full potential of cloud computing in building an analytics solution.
The question I have now however is a broader concept. In designing a solution that will in effect survive only if it is able to consume a large amount of data would you design your system differently?
Google solved a big data problem in their search engine in part by creating the mapr and hadoop solutions but also in separating the two conceptual areas of the solution (indexing and search) if we consider most big data solutions in the end that would make logical sense, to separate the two components (ingest and analysis) as they would both be I/O and disk bound.
Of course if you are building a disk bound analysis system you have to worry about two additional things (disk failure and in the end how much data are you going to store). Its easy to retrieve the data from a spinning disk but if your system requires High Availability you can’t have disks fial.
So now, in addition to separating the ingest from the analysis you have to consider a couple of other issues.
- MTBF
- Amount of data you are ingesting
The first in that it tells you the number of copies you will need of the data in reality (let’s call that the backup model). The second may render the first into the biggest issue (when you start talking about a lot of inbound data to be ingested).
.doc
Scott Andersen
IASA Fellow