Training TensorFlow models with big tabular datasets (ii)
April 25, 2022 · 1 min · 217 words
In my last post I talked about how I used TensorFlow datasets to speed up the training phase. Today I’ve discovered another game changer: the prefetch
method.
Whith this method, your dataset is going to prefetch (aka prepare before needed) some batches while the current element is being processed. Therefore, we improve latency and throughput at the cost of consuming more memory. Also, according to TensorFlow documentation:
Most dataset input pipelines should end with a call to prefetch
To call the prefetch
method you need to specify the buffer_size
, this is the maximum number of elements that will be buffered when prefetching. If you want to set this value dinamycally to the “optimal” one 1 you can just use tf.data.experimental.AUTOTUNE
.
Finally, just by adding the line
dataset = dataset.prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
I’ve reduced by 2 the training time of my model.
-
I didn’t find any TensorFlow documentation regarding how the optimal value is computed. I’ll research this and write a post about it in the future. ↩