What did i do wrong.. Spark-cassandra-connector

Just want to say. I am soooo not happy. Bad grammar lol.

Where did i do wrong..

Schema: x-y-z-data the data is a list of time series. x,y,z are all integers indicating physical space. There is only ~10G Data in there for each chunk.

I used cassandra as database to store. Used int-int-int-list<int> as schema.

Just wanted to do the correlation task(pearson’s correlation) and the task never finished. Its been running for 10 hours onw.

WTF. I was like. WTF.

I tried new schema as well.. something like id-x-y-z-data and the id is a text with x|y|z and its the primary key. It did not finish in 10 hours time. WTF. I have around 250G data in total to process.

Will update this once I found a cure.


1 thought on “What did i do wrong.. Spark-cassandra-connector

  1. leolincoln Post author

    OKAY. I found the cause.
    I was using version 1.1.1 for my cassandra-spark-connector. and the newest is 1.2.0-rc3
    I changed all versions e.g. cassandra-spark-connector and cassandra-spark-connector-java to match with my database version and driver version, and also set the batch.size.byte to 2048, changed concurrently write to 1, and everything seems to be working



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s