What did i do wrong.. Spark-cassandra-connector

Just want to say. I am soooo not happy. Bad grammar lol.

Where did i do wrong..

Schema: x-y-z-data the data is a list of time series. x,y,z are all integers indicating physical space. There is only ~10G Data in there for each chunk.

I used cassandra as database to store. Used int-int-int-list<int> as schema.

Just wanted to do the correlation task(pearson’s correlation) and the task never finished. Its been running for 10 hours onw.

WTF. I was like. WTF.

I tried new schema as well.. something like id-x-y-z-data and the id is a text with x|y|z and its the primary key. It did not finish in 10 hours time. WTF. I have around 250G data in total to process.

Will update this once I found a cure.

Advertisements

One thought on “What did i do wrong.. Spark-cassandra-connector

  1. leolincoln Post author

    OKAY. I found the cause.
    I was using version 1.1.1 for my cassandra-spark-connector. and the newest is 1.2.0-rc3
    I changed all versions e.g. cassandra-spark-connector and cassandra-spark-connector-java to match with my database version and driver version, and also set the batch.size.byte to 2048, changed concurrently write to 1, and everything seems to be working

    Like

    Reply

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s