Monday, June 22, 2015

Data - Bespoke&Clean or Big&Messy?

I'm reading a book Robin got me about Big Data (in Dutch!). It's very interesting and has plenty of examples in Industry and Government on both the ethical and non-ethical use of all that data. The first thing that stuck out for me was the following premise:

The old way is to get high-precision clean data, the new way is to get LOTS of data with much greater uncertainties. To me that spells out an information-theory issue (I don't know a lot about this so this may be a completely solved issue): is there as much information in a small data-set with small errors as there is in a big data-set with huge errors. I am guessing that the first question will be "how big is big and how small is small". Okay your mileage may vary.

But this brings me to the project my Bachelor students have been doing. They have been doing a very clean sample, small errorbars, nice fit project (hey Bachelor students... can't give them horrible data, that is just mean...)

But I think their project can have a follow-up as a Big Astro Data pathfinder project. Go into the archive, get ALL TEH DATAS and then model with enough slop.

this is taking up mental CPU now. I need to email people and make it their problem...

No comments:

Post a Comment