CatBoost

jeudi 10 janvier 2019

Usually when you think of a gradient boosted decision tree you think of XGBoost or LightGBM. I'd heard of CatBoost but I'd never tried it and it didn't seem too popular. I was looking at a Kaggle competition which had a lot of categorical data and I had squeezed just about every drop of performance I could out of LGBM so I decided to give CatBoost a try. I was extremely impressed.

Out of the box, with all default parameters, CatBoost scored better than the LGBM I had spent about a week tuning. CatBoost trained significantly slower than LGBM, but it will run on a GPU and doing so makes it train just slightly slower than the LGBM. Unlike XGBoost it can handle categorical data, which is nice because in this case we have far too many categories to do one-hot encoding. I've read the documentation several times but I am still unclear as to how exactly it encodes the categorical data, but whatever it does works very well.

I am just beginning to try to tune the hyperparameters so it is unclear how much (if any) extra performance I'll be able to squeeze out of it, but I am very, very impressed with CatBoost and I highly recommend it for any datasets which contain categorical data. Thank you Yandex! 

Libellés: coding, data_science, machine_learning, kaggle, catboost
2 commentaires

Exercise Log

mardi 27 novembre 2018

I exercise quite a lot and I have not been able to find an app to keep track of it which satisfies all of my criteria. Most fitness trackers are geared towards cardio and I also do a lot of strength training. After spending a year trying to make due with combinations of various fitness trackers and other apps I decided to just write my own, which could do everything I wanted and could show all of the reports I wanted.

I did that and after using it for a few weeks put it online at workout-log.com. It's not fancy and it is quite likely very buggy at this point, but it is open to anyone who wants to use it. 

It's written with Django and jQuery and uses ChartJS for the charts. 

Libellés: python, django, data_science, machine_learning
1 commentaires

CoLab TPUs One Month Later

mercredi 31 octobre 2018

After having used both CoLab GPUs and TPUs for almost a month I must significantly revise my previous opinion. Even for a Keras model not written or optimized for TPUs, with some minimal configuration changes TPUs perform much faster - minimum of twice the speed. In addition to making sure that all operations are TPU compatible, the only major configuration change required is increasing the batch size by 8. At first I was playing around with the batch size, but I realized that this was unnecessary. TPUs have 8 shards, so you simply multiple the GPU batch size by 8 and that should be a good baseline. 

The model I am currently training on a TPU and a GPU simultaneously is training 3-4x faster on the TPU than on the GPU and the code is exactly the same. I have this block of code:

use_tpu = True
# if we are using the tpu copy the keras model to a new var and assign the tpu model to model
if use_tpu:
    TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']
    
    # create network and compiler
    tpu_model = tf.contrib.tpu.keras_to_tpu_model(
        model, strategy = tf.contrib.tpu.TPUDistributionStrategy(
            tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))
    
    BATCH_SIZE = BATCH_SIZE * 8

The model is created with Keras and the only change I make is setting use_tpu to True on the TPU instance. 

One other thing I thought I would mention is that CoLab creates separate instances for GPU, TPU and CPU, so you can run multiple notebooks without sharing RAM or processor if you give each one a different type.

Libellés: machine_learning, tensorflow, google, google_cloud
2 commentaires

CoLab TPUs

mardi 09 octobre 2018

The other day I was having problems with a CoLab notebook and I was trying to debug it when I noticed that TPU is now an option for runtime type. I found no references to this in the CoLab documentation, but apparently it was quietly introduced only recently. If anyone doesn't know, TPUs are chips designed by Google specifically for matrix multiplications and are supposedly incredibly fast. Last I checked the cost to rent one through GCP was about $6 per hour, so the ability to have access to one for free could be a huge benefit.

As TPUs are specialized chips you can't just run the same code as on a CPU or a GPU. TPUs do not support all TensorFlow operations and you need to create a special optimizer to be able to take advantage of the TPU at all. The model I was working with at the time was created using TensorFlow's Keras API so I decided to try to convert that to be TPU compatible in order to test it.

Normally you would have to use a cross shard optimizer, but there is a shortcut for Keras models:

TPU_WORKER = 'grpc://' + os.environ['COLAB_TPU_ADDR']

# create network and compiler
tpu_model = tf.contrib.tpu.keras_to_tpu_model(
keras_model, strategy = tf.contrib.tpu.TPUDistributionStrategy(
    tf.contrib.cluster_resolver.TPUClusterResolver(TPU_WORKER)))

The first line finds an available TPU and gets it's address. The second line takes your keras model as input and converts it to a TPU compatible model. Then you would train the model using tpu_model.fit() instead of keras_model. This was the easy part.

For this particular model I am using a lot of custom functions for loss and metrics. Many of the functions turned out to not be compatible with TPUs so had to be rewritten. While at the time this was annoying, it turned out to be worth it regardless of the TPU because I had to optimize the functions in order to make them compatible with TPUs. The specific operations which were not compatible were non-matrix ops - logical operations and boolean masks specifically. Some of the code was downright hideous and this forced me to sit down and think through it and re-write it in a much cleaner manner, vectorizing as much as possible.

After all that effort, so far my experience with the TPUs hasn't been all that great. I can train my model with a significantly larger batch size - whereas  on an Nvidia K80 16 was the maximum batch size, I am currently training with batches of 64 on the TPU and may be able to push that even higher. However the time per epoch hasn't really improved all that much - it is about 1750 seconds on the TPU versus 1850 seconds on the K80. I have read code may need to be altered more to take full advantage of TPUs and I have not really tried playing with the batch size to see how that changes the performance yet.

I suspect that if I did some more research about TPUs and coded the model to be optimized for a TPU from scratch there might be a more noticeable performance gain, but this is based solely on having heard other people talk about how fast they are and not from my experience. 

Update - I have realized that the data augmentation is the bottleneck which is limiting the speed of training. I am training with a Keras generator which performs the augmentation on the CPU and if this is removed or reduced the TPUs do, in fact, train significantly faster than a GPU and also yield better results.

Libellés: coding, machine_learning, google_cloud
Aucun commentaire

Archives du Blogue