Just want to say. I am soooo not happy. Bad grammar lol.
Where did i do wrong..
Schema: x-y-z-data the data is a list of time series. x,y,z are all integers indicating physical space. There is only ~10G Data in there for each chunk.
I used cassandra as database to store. Used int-int-int-list<int> as schema.
Just wanted to do the correlation task(pearson’s correlation) and the task never finished. Its been running for 10 hours onw.
WTF. I was like. WTF.
I tried new schema as well.. something like id-x-y-z-data and the id is a text with x|y|z and its the primary key. It did not finish in 10 hours time. WTF. I have around 250G data in total to process.
Will update this once I found a cure.