What a surprise - it turns out the labels for the test are not provided! There was an error in Fuel converter, due to which all images from the test set were tagged as dog: see the issue and the PRs that fix it: one and two.

Hopefully I can still evaluate my test set prediction using Kaggle. For now, I will just use last 5000 examples as a validation set.