Author Archives: niuoniu

About niuoniu

Hi! My name is Liu Liu. I am a Ph.D student in Computer Science at Northwestern University(NU). I received Bachelor of Science degree from University of Arizona, and Master’s from Northwestern University, all in computer science. My research focuses on big temporal data streaming and analysis with no-sql databases.

Heat map subplots sharing same color bar pandas with seaborn

Make sure you have pandas and seaborn installed

fig, (ax0,ax1) = plt.subplots(1, 2, sharex=True, sharey=True)
cbar_ax = fig.add_axes([.91,.3,.03,.4])
sns.heatmap(pd1.corr(),ax=ax0,cbar=True,vmin=-1,vmax=1,cbar_ax = cbar_ax)
sns.heatmap(pd2.corr(),ax=ax1,cbar=True,vmin=-1,vmax=1,cbar_ax = cbar_ax)
ax1.set_title('title 2')
fig.suptitle('big title',fontsize=20)
#saving figure for publication if needed
plt.savefig('save.tif', dpi=300)

thats it! I guess the most important thing here is cbar_ax = fig.add_axes([.91,.3,.03,.4]) and make sure you have a fixed vmin and vmax.


Signing an unsignable PDF

Normally for us, signing PDFs should be easy. But thats why you come here.

The trick for the article is to help you get an unsignable PDF signed electronically and relatively unpainful. Follow the steps below you will get your pdf signed, and sent back. Assuming you have ghostscript, gimp installed, and assuming you are using mac. (both are free btw)

if you dont know how to install ghostscript and gimp, refer to my other post
install ghostscript and gimp

  1. You should do as much possible to fill in the information, up till you cannot fill any more. Assuming you have ghostscript, gimp installed. (both are free btw)
  2. At this point, save your pdf and then use the command (assuming your pdf file is called convert.pdf)

    gs -q -dSafer -dBatch -dNOPAUSE -sDEVICE=tiff32nc -r300 -sOutputFile=convert.tif convert.pdf -c quit

    Your converted picture file will be called convert.tif and it is in 300 dpi.
  3. Now open the convert.tif in gimp, sign using the draw tool. When you are finished, go to file->export and export as result.pdf (type it is easy).
  4. Now that you finished, close all windows, send the file to the email address requesting the signed pdf. Done. You have saved a tree branch for the earth. LOL

How to make Custom estimator class and custom scorer to do cross validation using sklearn api on your custom model

I made a combined weak classifier model, needed a custom estimator and custom scorer. I went through a few stack overflow articles however none actually targeted specifically for cross validation in sklearn.

Then I figured I would try to implement baseestimator class, and make my own scorer. It WORKED. :>

Therefore, I am posting instructions here on how to use it, hopefully its gonna be useful to you.


  1. Write your own estimator class, just make sure to implement base estimator (or extend I am not sure how this works in python but its similar. base estimator is like an interface or abstract class provides basic functionalities for estimator)
  2. Write your loss function or gain function, and then make your own scorer
  3. Use the sklearn api to do cross validation. Using whatever you have created in 1 and 2.

Code: Please read comments. Important.

#create a custom estimator class

#Keep in mind. This is just a simplified version. You can treat it as any other class, just make sure the signitures should stay same, or you should add default value to other parameters

from sklearn.base import BaseEstimator
class custom_classifier(BaseEstimator):
  from sklearn import tree
  from sklearn.cluster import KMeans
  import numpy as np
  from sklearn.cluster import KMeans
  #Kmeans clustering model
  __clusters = None
  #decision tree model
  __tree = None
  #x library.
  __X = None
  #y library.
  __y = None
  #columns selected from pandas dataframe.
  __columns = None

  def fit(self, X, y, **kwargs):

  def predict(self,X):
    result_kmeans = self.__clusters.predict(X)
    result_tree = self.__tree.predict(X)
    result = result_tree
    return np.array(result)

  def fit_kmeans(self,X,y):
    clusters = KMeans(n_clusters=4, random_state=0).fit(X)
    #the error center should have the lowest number of labels.(implementation not shown here)
    self.__clusters = clusters

  def fit_decisiontree(self,X,y):
    temp_tree = tree.DecisionTreeClassifier(criterion='entropy',max_depth=3),y)
    self.__tree = temp_tree

Now we have our class. We need to build hit/loss function:

#again, feel free to change any thing in the hit function. As long as the function signature remain the same.

def seg_tree_hit_func(ground_truth, predictions):
  total_hit = 0
  total_number = 0
  for i in xrange(len(predictions)):
    if predictions[i]==2:
      total_hit += (1-abs(ground_truth[i]-predictions[i]))
    print 'skipped: ',len(predictions)- total_number,'/',len(predictions),'instances'
  return total_hit/total_number if total_number!=0 else 0

Now we still need to build scorer.

from sklearn.metrics.scorer import make_scorer

#make our own scorer
score = make_scorer(seg_tree_hit_func, greater_is_better=True)

We have our scorer, our estimator, and so we can start doing cross-validation task:

#change the 7 to whatever fold validation you are running.

scores = cross_val_score(custom_classifier(), X, Y, cv=7, scoring=score)

There it is! You have your own scorer and estimator, and you can use sklearn api to plug it in anything from sklearn easily.


Hope this helps.

Converting Excel workbook to 300dpi high definition publishable tiff (after converting to pdf)

I got really frustrated when trying to convert my charts and excel worksheets into publishable content for my paper. Went through several stack overflow questions and used ghostscript to make it possible.

All softwares used in this post are free — $0. It is ideal for Ph.D students or home use for publishing.

  1. Install ghostscript, using homebrew just run brew install GhostScript and you are good.
  2. Install Gimp. From homebrew all you have to do is: brew install Caskroom/cask/gimp
  3. Open your excel worksheet. Set the page setup to be vertical or horizontal, select your area as print area, and then save as pdf. Make sure you save the workbook as pdf. E.g. now I have saved the workbook to be figure1.pdf
  4. In your command line, type  gs -q -dSafer -dBatch -dNOPAUSE -sDEVICE=tiff32nc -r300 -sOutputFile=figure1.tif figure1.pdf -c quit
  5. Open the figure1.tif in Gimp, select the area you want, and then choose image -> fit canvas to selection
  6. In file->export as simple export the file as figure1.tiff you are all set. If you look at the properties it has 300dpi.

You can easily use bash to do batch processing for pdf files, but for more accurate image selection you still need gimp.

Hope this will help other people!